Systems and Methods of Interacting with a Virtual Grid in a Three-dimensional (3D) Sensory Space

ABSTRACT

The technology disclosed relates to selecting a virtual item from a virtual grid in a three-dimensional (3D) sensory space. It also relates to navigating a virtual modality displaying a plurality of virtual items arranged in a grid by and automatically selecting a virtual item in a virtual grid at a terminal end of a control gesture of a control object responsive to a terminal gesture that transitions the control object from one physical arrangement to another. In one implementation, the control object is a hand. In some implementations, physical arrangements of the control object include at least a flat hand with thumb parallel to fingers, closed, half-open, pinched, curled, fisted, mime gun, okay sign, thumbs-up, ILY sign, one-finger point, two-finger point, thumb point, or pinkie point.

PRIORITY DATA

This application is a continuation of U.S. patent application Ser. No.15/832,697, entitled “SYSTEMS AND METHODS OF INTERACTING WITH A VIRTUALGRID IN A THREE-DIMENSIONAL (3D) SENSORY SPACE”, filed 5 Dec. 2017(Attorney Docket No. ULTI 1031-3) which is a continuation of U.S. patentapplication Ser. No. 14/625,632, entitled “SYSTEMS AND METHODS OFINTERACTING WITH A VIRTUAL GRID IN A THREE-DIMENSIONAL (3D) SENSORYSPACE”, filed 19 Feb. 2015 (Attorney Docket No. LEAP 1031-2/LPM-1031US1,which claims the benefit of U.S. Provisional Patent Application No.62/007,885, entitled, “SYSTEMS AND METHODS OF INTERACTING WITH A VIRTUALGRID IN A THREE-DIMENSIONAL (3D) SENSORY SPACE,” filed 4 Jun. 2014(Attorney Docket No. LEAP 1031-1/LPM-1031PR). The applications arehereby incorporated by reference for all purposes.

FIELD OF THE TECHNOLOGY DISCLOSED

The technology disclosed relates, in general, to augmented reality andvirtual reality, and in particular implementations, to facilitatinggestural interactions with a virtual object in a three-dimensional (3D)sensory space.

INCORPORATIONS

Materials incorporated by reference in this filing include thefollowing:

“PREDICTIVE INFORMATION FOR FREE SPACE GESTURE CONTROL ANDCOMMUNICATION”, U.S. Prov. App. No. 61/873,758, filed 4 Sep. 2013(Attorney Docket No. LEAP 1007-1/LPM-1007APR),

“VELOCITY FIELD INTERACTION FOR FREE SPACE GESTURE INTERFACE ANDCONTROL”, U.S. Prov. App. No. 61/891,880, filed 16 Oct. 2013 (AttorneyDocket No. LEAP 1008-1/1009APR),

“INTERACTIVE TRAINING RECOGNITION OF FREE SPACE GESTURES FOR INTERFACEAND CONTROL”, U.S. Prov. App. No. 61/872,538, filed 30 Aug. 2013(Attorney Docket No. LPM-013GPR),

“DRIFT CANCELLATION FOR PORTABLE OBJECT DETECTION AND TRACKING”, U.S.Prov. App. No. 61/938,635, filed 11 Feb. 2014 (Attorney Docket No. LEAP1037-1/LPM-1037PR),

“SAFETY FOR WEARABLE VIRTUAL REALITY DEVICES VIA OBJECT DETECTION ANDTRACKING”, U.S. Prov. App. No. 61/981,162, filed 17 Apr. 2014 (AttorneyDocket No. LEAP 1050-1/LPM-1050PR),

“WEARABLE AUGMENTED REALITY DEVICES WITH OBJECT DETECTION AND TRACKING”,U.S. Prov. App. No. 62/001,044, filed 20 May 2014 (Attorney Docket No.LEAP 1061-1/LPM-1061PR),

“METHODS AND SYSTEMS FOR IDENTIFYING POSITION AND SHAPE OF OBJECTS INTHREE-DIMENSIONAL SPACE”, U.S. Prov. App. No. 61/587,554, filed 17 Jan.2012 (Attorney Docket No. PA5663PRV),

“SYSTEMS AND METHODS FOR CAPTURING MOTION IN THREE-DIMENSIONAL SPACE”,U.S. Prov. App. No. 61/724,091, filed 8 Nov. 2012 (Attorney Docket No.LPM-001PR2/7312201010),

“NON-TACTILE INTERFACE SYSTEMS AND METHODS”, U.S. Prov. App. No.61/816,487, filed 26 Apr. 2013 (Attorney Docket No. LPM-028PR),

“DYNAMIC USER INTERACTIONS FOR DISPLAY CONTROL”, U.S. Prov. App.

No. 61/752,725, filed 15 Jan. 2013 (Attorney Docket No. LPM-013APR),

“WEARABLE AUGMENTED REALITY DEVICES WITH OBJECT DETECTION AND TRACKING”,U.S. Prov. App. No. 62/001,044, filed 20 May 2014 (Attorney Docket No.LEAP 1061-1/LPM-1061PR),

“VEHICLE MOTION SENSORY CONTROL”, U.S. Prov. App. No. 62/005,981, filed30 May 2014 (Attorney Docket No. LEAP 1052-1/LPM-1052PR),

“MOTION CAPTURE USING CROSS-SECTIONS OF AN OBJECT”, U.S. applicationSer. No. 13/414,485, filed 7 Mar. 2012 (Attorney Docket No. LEAP1006-7/LPM-1006US), and

“SYSTEM AND METHODS FOR CAPTURING MOTION IN THREE-DIMENSIONAL SPACE”,U.S. application Ser. No. 13/742,953, filed 16 Jan. 2013 (AttorneyDocket No. LPM-001CP2/7312204002).

BACKGROUND

The subject matter discussed in this section should not be assumed to beprior art merely as a result of its mention in this section. Similarly,a problem mentioned in this section or associated with the subjectmatter provided as background should not be assumed to have beenpreviously recognized in the prior art. The subject matter in thissection merely represents different approaches, which in and ofthemselves may also correspond to implementations of the claimedtechnology.

Augmented Reality (AR) technology refers to the real-time registrationof 2D or 3D computer generated imagery onto a live view of a real-worldphysical space. A user is able to view and interact with the augmentedimagery in such a way as to manipulate the virtual objects in theirview.

However, existing human-AR systems interactions are very limited andunfeasible. Current AR systems are complex as they force the user tointeract with AR environment using a keyboard and mouse, or a vocabularyof simply hand gestures. Further, despite strong academic and commercialinterest in AR systems, AR systems continue to be costly and requiringexpensive equipment, and thus stand unsuitable for general use by theaverage consumer.

An opportunity arises to provide an economical approach that providesadvantages of AR for enhanced and sub-millimeter precision interactionwith virtual objects without the draw backs of attaching or deployingspecialized hardware.

SUMMARY

The technology disclosed relates to selecting a virtual item from avirtual grid in a three-dimensional (3D) sensory space. In particular,it relates to generating a virtual grid with a plurality of grid linesand corresponding plurality of virtual items responsive to gestures in athree-dimensional (3D) sensory space, detecting a gesture in the 3Dsensory space and interpreting the gesture as selecting one of thevirtual items, and automatically reporting the selection to a furthercomputer-implemented process.

The technology disclosed also relates to navigating a virtual modalitydisplaying a plurality of virtual items arranged in a grid. Inparticular, it relates to detecting a first sweep of a control objectresponsive to a first control gesture in a three-dimensional (3D)sensory space, defining an extent of translation along a first axis of avirtual grid in proportion to length of the first sweep of the controlobject, detecting a second sweep of the control object responsive to asecond control gesture in the 3D sensory space, defining an extent oftranslation along a second axis of the virtual grid in proportion tolength of the second sweep of the control object, and automaticallyselecting a virtual item in the virtual grid at a terminal end of thesecond sweep.

The technology disclosed further relates to navigating a virtualmodality displaying a plurality of virtual items arranged in a grid. Inparticular, it relates to detecting a horizontal sweep of a controlobject responsive to a first control gesture in a three-dimensional (3D)sensory space, defining a horizontal extent of translation along a firstaxis of a virtual grid in proportion to length of the horizontal sweepof the control object, detecting a vertical sweep of the control objectresponsive to a second control gesture in the 3D sensory space, defininga vertical extent of translation along a second axis of the virtual gridin proportion to length of the vertical sweep of the control object,wherein the second axis is about perpendicular to the first axis, andautomatically selecting a virtual item in the virtual grid at a terminalend of the vertical sweep responsive to a terminal gesture thattransitions the control object from one physical arrangement to another.

In one implementation, the control object is a hand. In someimplementations, physical arrangements of the control object include atleast a flat hand with thumb about parallel to fingers. In some otherimplementations, physical arrangements of the control object include atleast open, closed, and half-open. In yet other implementations,physical arrangements of the control object include at least pinched,curled, and fisted. In other implementations, physical arrangements ofthe control object include at least mime gun, okay sign, thumbs-up, andILY sign. In yet other implementations, physical arrangements of thecontrol object include at least one-finger point, two-finger point,thumb point, and pinkie point.

Other aspects and advantages of the technology disclosed can be seen onreview of the drawings, the detailed description and the claims, whichfollow.

BRIEF DESCRIPTION OF THE DRAWINGS

In the drawings, like reference characters generally refer to like partsthroughout the different views. Also, the drawings are not necessarilyto scale, with an emphasis instead generally being placed uponillustrating the principles of the technology disclosed. In thefollowing description, various implementations of the technologydisclosed are described with reference to the following drawings, inwhich:

FIG. 1 illustrates an exemplary gesture-recognition system.

FIG. 2 is a simplified block diagram of a computer system implementing agesture-recognition apparatus according to an implementation of thetechnology disclosed.

FIG. 3A shows one implementation of a virtual modality hosting a virtualgird that includes virtual slips.

FIG. 3B illustrates one implementation of an augmented reality (AR)environment created by instantiation of a free-floating virtual modalityin a real-world physical space.

FIG. 4A is one implementation of navigating a virtual modality using avertical sweep.

FIG. 4B illustrates one implementation of navigating a virtual modalityusing a horizontal sweep and selecting a virtual object in the virtualmodality.

FIG. 5A shows one implementation of identifying, for selection, aparticular virtual object in a virtual modality by placing hand behindor underneath the particular virtual object.

FIG. 5B illustrates one implementation of selecting a particular virtualobject in a virtual modality using a scooping gesture.

FIG. 6 illustrates one implementation of selecting a particular virtualobject in response to a transition of physical arrangement of a handfrom clenched first to open hand.

FIG. 7 is one implementation of selecting a particular virtual object inresponse to a transition of physical arrangement of a hand fromflat-hand hovering gesture to pinching gesture.

FIG. 8 depicts one implementation of selecting a particular virtualobject in response to a transition of physical arrangement of a handfrom one-finger pointing to immediate opening of the hand.

FIG. 9 shows one implementation of selecting a particular virtual objectin response to a transition of physical arrangement of a hand fromflat-hand hovering gesture to curling gesture.

FIG. 10 is one implementation of selecting a particular virtual objectin response to a transition of physical arrangement of a hand frombunched-fingers to spreading apart of the fingers and to immediatebunching of the fingers.

FIG. 11 is one implementation of selecting a particular virtual objectin response to a transition of physical arrangement of a hand fromflat-hand hovering gesture to okay gesture.

FIGS. 12A, 12B and 12C illustrate one implementation of generating fordisplay a proximity indicator for a hand and automatically selecting aparticular virtual object when the hand approaches the virtual objectwithin an initial hover proximity threshold.

FIGS. 13A and 13B depict one implementation of selecting a particularvirtual object in response to firing of finger gun.

FIGS. 14A and 14B are one implementation of selecting a particularvirtual object in response to rotation of a hand.

FIG. 15 illustrates one implementation of a method of selecting avirtual item from a virtual grid in a three-dimensional (3D) sensoryspace.

FIG. 16 is a flowchart showing a method of navigating a virtual modalitydisplaying a plurality of virtual items arranged in a grid.

DESCRIPTION

Implementations of the technology disclosed relate to methods andsystems that facilitate gestural interactions with a virtual grid in athree-dimensional (3D) sensory space. The technology disclosed can beapplied to solve the problem of how the user interacts with theaugmented reality environment that is displayed. Existing AR systemsrestrict the user experience and prevent complete immersion into thereal-world by limiting the degrees of freedom to control virtualobjects. Where interaction is enabled, it is coarse, imprecise, andcumbersome and interferes with the user's natural movement. Suchconsiderations of cost, complexity and convenience have limited thedeployment and use of AR technology.

The technology disclosed allows the user to freely move around the ARenvironment and interact with the augmented object through free-formgestures. Examples of systems, apparatus, and methods according to thedisclosed implementations are described in a “virtual slips” context.The examples of “virtual slips” are being provided solely to add contextand aid in the understanding of the disclosed implementations. In otherinstances, examples of gesture-based AR interactions in other contextslike virtual games, virtual applications, virtual programs, virtualoperating systems, etc. may be used. Other applications are possible,such that the following examples should not be taken as definitive orlimiting either in scope, context, or setting. It will thus be apparentto one skilled in the art that implementations may be practiced in oroutside the “virtual slips” context.

As used herein, a given signal, event or value is “responsive to” apredecessor signal, event or value of the predecessor signal, event orvalue influenced by the given signal, event or value. If there is anintervening processing element, step or time period, the given signal,event or value can still be “responsive to” the predecessor signal,event or value. If the intervening processing element or step combinesmore than one signal, event or value, the signal output of theprocessing element or step is considered “responsive to” each of thesignal, event or value inputs. If the given signal, event or value isthe same as the predecessor signal, event or value, this is merely adegenerate case in which the given signal, event or value is stillconsidered to be “responsive to” the predecessor signal, event or value.“Responsiveness” or “dependency” or “basis” of a given signal, event orvalue upon another signal, event or value is defined similarly.

As used herein, the “identification” of an item of information does notnecessarily require the direct specification of that item ofinformation. Information can be “identified” in a field by simplyreferring to the actual information through one or more layers ofindirection, or by identifying one or more items of differentinformation which are together sufficient to determine the actual itemof information. In addition, the term “specify” is used herein to meanthe same as “identify.”

In this application, reference numerals not followed by a letter of thealphabet and not corresponding to at least one reference numeral in thefigures refers to a collection of reference numerals followed by atleast one letter of the alphabet and with the same base referencenumeral. For example, a reference numeral 412 refers to the collectionof reference numerals including 412 and followed by a letter in thealphabet, such as 412A, 412B, and the like.

Gesture Recognition System

The term “motion capture” refers generally to processes that capturemovement of a subject in three-dimensional (3D) space and translate thatmovement into, for example, a digital model or other representation.Motion capture is typically used with complex subjects that havemultiple separately articulating members whose spatial relationshipschange as the subject moves. For instance, if the subject is a walkingperson, not only does the whole body move across space, but thepositions of arms and legs relative to the person's core or trunk areconstantly shifting. Motion-capture systems are typically designed tomodel this articulation.

Motion capture systems can utilize one or more cameras to capturesequential images of an object in motion, and computers to analyze theimages to create a reconstruction of an object's shape, position, andorientation as a function of time. For 3D motion capture, at least twocameras are typically used. Image-based motion-capture systems rely onthe ability to distinguish an object of interest from a background. Thisis often achieved using image-analysis algorithms that detect edges,typically by comparing pixels to detect abrupt changes in color and/orbrightness. Conventional systems, however, suffer performancedegradation under many common circumstances, e.g., low contrast betweenthe object of interest and the background and/or patterns in thebackground that may falsely register as object edges.

Referring first to FIG. 1 , which illustrates an exemplary gesturerecognition system 100 including any number of cameras 102, 104 coupledto an image and image analysis, motion capture, and augmented reality(AR) generation system 106 (The system 106 is hereinafter variablyreferred to as the “image analysis and motion capture system,” the“image analysis system,” the “motion capture system,” the “control andimage-processing system,” the “control system,” or the “image-processingsystem,” “augmented reality (AR) generation system,” depending on whichfunctionality of the system is being discussed.). Cameras 102, 104 canbe any type of cameras, including cameras sensitive across the visiblespectrum or, more typically, with enhanced sensitivity to a confinedwavelength band (e.g., the infrared (IR) or ultraviolet bands); moregenerally, the term “camera” herein refers to any device (or combinationof devices) capable of capturing an image of an object and representingthat image in the form of digital data. While illustrated using anexample of a two-camera implementation, other implementations arereadily achievable using different numbers of cameras or non-cameralight sensitive image sensors or combinations thereof. For example, linesensors or line cameras rather than conventional devices that capture atwo-dimensional (2D) image can be employed. Further, the term “light” isused generally to connote any electromagnetic radiation, which may ormay not be within the visible spectrum, and may be broadband (e.g.,white light) or narrowband (e.g., a single wavelength or narrow band ofwavelengths).

Cameras 102, 104 are preferably capable of capturing video images (i.e.,successive image frames at a constant rate of at least 15 frames persecond); although no particular frame rate is required. The capabilitiesof cameras 102, 104 are not critical to the technology disclosed, andthe cameras can vary as to frame rate, image resolution (e.g., pixelsper image), color or intensity resolution (e.g., number of bits ofintensity data per pixel), focal length of lenses, depth of field, etc.In general, for a particular application, any cameras capable offocusing on objects within a spatial volume of interest can be used. Forinstance, to capture motion of the hand of an otherwise stationaryperson, the volume of interest can be defined as a cube approximatelyone meter on a side.

In some implementations, the illustrated system 100 includes one or moresources 108, 110, which can be disposed to either side of cameras 102,104, and are controlled by image analysis and motion capture system 106.In one implementation, the sources 108, 110 are light sources. Forexample, the light sources can be infrared light sources, e.g., infraredlight emitting diodes (LEDs), and cameras 102, 104 can be sensitive toinfrared light. Use of infrared light can allow the gesture recognitionsystem 100 to operate under a broad range of lighting conditions and canavoid various inconveniences or distractions that may be associated withdirecting visible light into the region where the person is moving.However, a particular wavelength or region of the electromagneticspectrum can be required. In one implementation, filters 120, 122 areplaced in front of cameras 102, 104 to filter out visible light so thatonly infrared light is registered in the images captured by cameras 102,104. In another implementation, the sources 108, 110 are sonic sourcesproviding sonic energy appropriate to one or more sonic sensors (notshown in FIG. 1 for clarity sake) used in conjunction with, or insteadof, cameras 102, 104. The sonic sources transmit sound waves to theuser; with the user either blocking (“sonic shadowing”) or altering thesound waves (“sonic deflections”) that impinge upon her. Such sonicshadows and/or deflections can also be used to detect the user'sgestures and/or provide presence information and/or distance informationusing ranging techniques. In some implementations, the sound waves are,for example, ultrasound, that are not audible to humans.

It should be stressed that the arrangement shown in FIG. 1 isrepresentative and not limiting. For example, lasers or other lightsources can be used instead of LEDs. In implementations that includelaser(s), additional optics (e.g., a lens or diffuser) can be employedto widen the laser beam (and make its field of view similar to that ofthe cameras). Useful arrangements can also include short-angle andwide-angle illuminators for different ranges. Light sources aretypically diffuse rather than specular point sources; for example,packaged LEDs with light-spreading encapsulation are suitable.

In operation, light sources 108, 110 are arranged to illuminate a regionof interest 112 that includes an entire control object 114 or itsportion (in this example, a hand) that may optionally hold a tool orother object of interest. Cameras 102, 104 are oriented toward theregion of interest 112 to capture video images of the object (the hand)114. In some implementations, the operation of light sources 108, 110and cameras 102, 104 is controlled by the image analysis and motioncapture system 106, which can be, e.g., a computer system, control logicimplemented in hardware and/or software or combinations thereof. Basedon the captured images, image analysis and motion capture system 106determines the position and/or motion of the object (the hand) 114.

Gesture recognition can be improved by enhancing contrast between theobject of interest, such as the hand 114, and background surfaces suchas the surface 116 visible in an image, for example, by means ofcontrolled lighting directed at the object. For instance, in motioncapture system 106 where an object of interest 114, such as a person'shand, is significantly closer to the cameras 102 and 104 than thebackground surface 116, the falloff of light intensity with distance(1/r² for point-like light sources) can be exploited by positioning alight source (or multiple light sources) near the camera(s) or otherimage-capture device(s) and shining that light onto the object ofinterest 114. Source light reflected by the nearby object of interest114 can be expected to be much brighter than light reflected from moredistant background surface 116, and the more distant the background(relative to the object), the more pronounced the effect will be.Accordingly, a threshold cut off on pixel brightness in the capturedimages can be used to distinguish “object” pixels from “background”pixels. While broadband ambient light sources can be employed, variousimplementations use light having a confined wavelength range and acamera matched to detect such light; for example, an infrared sourcelight can be used with one or more cameras sensitive to infraredfrequencies.

In operation, cameras 102, 104 are oriented toward a region of interest112 in which an object of interest 114 (in this example, a hand) and oneor more background objects 116 can be present. Light sources 108, 110are arranged to illuminate region of interest 112. In someimplementations, one or more of the light sources 108, 110 and one ormore of the cameras 102, 104 are disposed below the motion to bedetected, e.g., in the case of hand motion, on a table or other surfacebeneath the spatial region where hand motion occurs. This is an optimallocation because the amount of information recorded about the hand isproportional to the number of pixels it occupies in the camera images,and the hand will occupy more pixels when the camera's angle withrespect to the hand's “pointing direction” is as close to perpendicularas possible. Further, if the cameras 102, 104 are looking up, there islittle likelihood of confusion with background objects (clutter on theuser's desk, for example) and other people within the cameras' field ofview.

Control and image-processing system 106, which can be, e.g., a computersystem, can control the operation of light sources 108, 110 and cameras102, 104 to capture images of region of interest 112. Based on thecaptured images, the image-processing system 106 determines the positionand/or motion of the object of interest 114. For example, as a step indetermining the position of the object of interest 114, image-analysissystem 106 can determine which pixels of various images captured bycameras 102, 104 contain portions of the object of interest 114. In someimplementations, any pixel in an image can be classified as an “object”pixel or a “background” pixel depending on whether that pixel contains aportion of the object of interest 114 or not. With the use of lightsources 108, 110, classification of pixels as object or backgroundpixels can be based on the brightness of the pixel. For example, thedistance (rO) between an object of interest 114 and cameras 102, 104 isexpected to be smaller than the distance (rB) between backgroundobject(s) 116 and cameras 102, 104. Because the intensity of light fromsources 108, 110 decreases as 1/r², the object of interest 114 will bemore brightly lit than background 116, and pixels containing portions ofthe object of interest 114 (i.e., object pixels) will be correspondinglybrighter than pixels containing portions of background 116 (i.e.,background pixels). For example, if rB/rO=2, then object pixels will beapproximately four times brighter than background pixels, assuming theobject of interest 114 and background 116 are similarly reflective ofthe light from sources 108, 110, and further assuming that the overallillumination of region of interest 112 (at least within the frequencyband captured by cameras 102, 104) is dominated by light sources 108,110. These assumptions generally hold for suitable choices of cameras102, 104, light sources 108, 110, filters 120, 122, and objects commonlyencountered. For example, light sources 108, 110 can be infrared LEDscapable of strongly emitting radiation in a narrow frequency band, andfilters 120, 122 can be matched to the frequency band of light sources108, 110. Thus, although a human hand or body, or a heat source or otherobject in the background, may emit some infrared radiation, the responseof cameras 102, 104 can still be dominated by light originating fromsources 108, 110 and reflected by the object of interest 114 and/orbackground 116.

In this arrangement, image-analysis system 106 can quickly andaccurately distinguish object pixels from background pixels by applyinga brightness threshold to each pixel. For example, pixel brightness in aCMOS sensor or similar device can be measured on a scale from 0.0 (dark)to 1.0 (fully saturated), with some number of gradations in betweendepending on the sensor design. The brightness encoded by the camerapixels scales standardly (linearly) with the luminance of the object,typically due to the deposited charge or diode voltages. In someimplementations, light sources 108, 110 are bright enough that reflectedlight from an object at distance rO produces a brightness level of 1.0while an object at distance rB=2rO produces a brightness level of 0.25.Object pixels can thus be readily distinguished from background pixelsbased on brightness. Further, edges of the object can also be readilydetected based on differences in brightness between adjacent pixels,allowing the position of the object within each image to be determined.Correlating object positions between images from cameras 102, 104 allowsimage-analysis system 106 to determine the location in 3D space of theobject of interest 114, and analyzing sequences of images allowsimage-analysis system 106 to reconstruct 3D motion of the object ofinterest 114 using conventional motion algorithms.

In accordance with various implementations of the technology disclosed,the cameras 102, 104 (and typically also the associated image-analysisfunctionality of control and image-processing system 106) are operatedin a low-power mode until an object of interest 114 is detected in theregion of interest 112. For purposes of detecting the entrance of anobject of interest 114 into this region, the system 100 further includesone or more light sensors 118 that monitor the brightness in the regionof interest 112 and detect any change in brightness. For example, asingle light sensor including, e.g., a photodiode that provides anoutput voltage indicative of (and over a large range proportional to) ameasured light intensity may be disposed between the two cameras 102,104 and oriented toward the region of interest 112. The one or moresensors 118 continuously measure one or more environmental illuminationparameters such as the brightness of light received from theenvironment. Under static conditions—which implies the absence of anymotion in the region of interest 112—the brightness will be constant. Ifan object enters the region of interest 112, however, the brightness mayabruptly change. For example, a person walking in front of the sensor(s)118 may block light coming from an opposing end of the room, resultingin a sudden decrease in brightness. In other situations, the person mayreflect light from a light source in the room onto the sensor, resultingin a sudden increase in measured brightness.

The aperture of the sensor(s) 118 may be sized such that its (or theircollective) field of view overlaps with that of the cameras 102, 104. Insome implementations, the field of view of the sensor(s) 118 issubstantially co-existent with that of the cameras 102, 104 such thatsubstantially all objects entering the camera field of view aredetected. In other implementations, the sensor field of view encompassesand exceeds that of the cameras. This enables the sensor(s) 118 toprovide an early warning if an object of interest approaches the camerafield of view. In yet other implementations, the sensor(s) capture(s)light from only a portion of the camera field of view, such as a smallerarea of interest located in the center of the camera field of view.

The control and image-processing system 106 monitors the output of thesensor(s) 118, and if the measured brightness changes by a set amount(e.g., by 10% or a certain number of candela), it recognizes thepresence of an object of interest in the region of interest 112. Thethreshold change may be set based on the geometric configuration of theregion of interest and the motion-capture system, the general lightingconditions in the area, the sensor noise level, and the expected size,proximity, and reflectivity of the object of interest so as to minimizeboth false positives and false negatives. In some implementations,suitable settings are determined empirically, e.g., by having a personrepeatedly walk into and out of the region of interest 112 and trackingthe sensor output to establish a minimum change in brightness associatedwith the person's entrance into and exit from the region of interest112. Of course, theoretical and empirical threshold-setting methods mayalso be used in conjunction. For example, a range of thresholds may bedetermined based on theoretical considerations (e.g., by physicalmodelling, which may include ray tracing, noise estimation, etc.), andthe threshold thereafter fine-tuned within that range based onexperimental observations.

In implementations where the region of interest 112 is illuminated, thesensor(s) 118 will generally, in the absence of an object in this area,only measure scattered light amounting to a small fraction of theillumination light. Once an object enters the illuminated area, however,this object may reflect substantial portions of the light toward thesensor(s) 118, causing an increase in the measured brightness. In someimplementations, the sensor(s) 118 is (or are) used in conjunction withthe light sources 108, 110 to deliberately measure changes in one ormore environmental illumination parameters such as the reflectivity ofthe environment within the wavelength range of the light sources. Thelight sources may blink, and a brightness differential be measuredbetween dark and light periods of the blinking cycle. If no object ispresent in the illuminated region, this yields a baseline reflectivityof the environment. Once an object is in the region of interest 112, thebrightness differential will increase substantially, indicatingincreased reflectivity. (Typically, the signal measured during darkperiods of the blinking cycle, if any, will be largely unaffected,whereas the reflection signal measured during the light period willexperience a significant boost.) Accordingly, the control system 106monitoring the output of the sensor(s) 118 may detect an object in theregion of interest 112 based on a change in one or more environmentalillumination parameters such as environmental reflectivity that exceedsa predetermined threshold (e.g., by 10% or some other relative orabsolute amount). As with changes in brightness, the threshold changemay be set theoretically based on the configuration of the image-capturesystem and the monitored space as well as the expected objects ofinterest, and/or experimentally based on observed changes inreflectivity.

Computer System

FIG. 2 is a simplified block diagram of a computer system 200,implementing image analysis and motion capture system 106 according toan implementation of the technology disclosed. Image analysis and motioncapture system 106 can include or consist of any device or devicecomponent that is capable of capturing and processing image data. Insome implementations, computer system 200 includes a processor 206,memory 208, a sensor interface 242, a display 202 (or other presentationmechanism(s), e.g. holographic projection systems, wearable googles orother head-mounted displays (HMDs), heads-up displays (HUDs), othervisual presentation mechanisms or combinations thereof, speakers 212, akeyboard 222, and a mouse 232. Memory 208 can be used to storeinstructions to be executed by processor 206 as well as input and/oroutput data associated with execution of the instructions. Inparticular, memory 208 contains instructions, conceptually illustratedas a group of modules described in greater detail below, that controlthe operation of processor 206 and its interaction with the otherhardware components. An operating system directs the execution oflow-level, basic system functions such as memory allocation, filemanagement and operation of mass storage devices. The operating systemmay be or include a variety of operating systems such as MicrosoftWINDOWS operating system, the Unix operating system, the Linux operatingsystem, the Xenix operating system, the IBM AIX operating system, theHewlett Packard UX operating system, the Novell NETWARE operatingsystem, the Sun Microsystems SOLARIS operating system, the OS/2operating system, the BeOS operating system, the MAC OS operatingsystem, the APACHE operating system, an OPENACTION operating system,iOS, Android or other mobile operating systems, or another operatingsystem platform.

The computing environment can also include otherremovable/non-removable, volatile/nonvolatile computer storage media.For example, a hard disk drive can read or write to non-removable,nonvolatile magnetic media. A magnetic disk drive can read from or writeto a removable, nonvolatile magnetic disk, and an optical disk drive canread from or write to a removable, nonvolatile optical disk such as aCD-ROM or other optical media. Other removable/non-removable,volatile/nonvolatile computer storage media that can be used in theexemplary operating environment include, but are not limited to,magnetic tape cassettes, flash memory cards, digital versatile disks,digital video tape, solid physical arrangement RAM, solid physicalarrangement ROM, and the like. The storage media are typically connectedto the system bus through a removable or non-removable memory interface.

Processor 206 can be a general-purpose microprocessor, but depending onimplementation can alternatively be a microcontroller, peripheralintegrated circuit element, a CSIC (customer-specific integratedcircuit), an ASIC (application-specific integrated circuit), a logiccircuit, a digital signal processor, a programmable logic device such asan FPGA (field-programmable gate array), a PLD (programmable logicdevice), a PLA (programmable logic array), an RFID processor, smartchip, or any other device or arrangement of devices that is capable ofimplementing the actions of the processes of the technology disclosed.

Sensor interface 242 can include hardware and/or software that enablescommunication between computer system 200 and cameras such as cameras102, 104 shown in FIG. 1 , as well as associated light sources such aslight sources 108, 110 of FIG. 1 . Thus, for example, sensor interface242 can include one or more data ports 244, 245 to which cameras can beconnected, as well as hardware and/or software signal processors tomodify data signals received from the cameras (e.g., to reduce noise orreformat data) prior to providing the signals as inputs to amotion-capture (“mocap”) program 218 executing on processor 206. In someimplementations, sensor interface 242 can also transmit signals to thecameras, e.g., to activate or deactivate the cameras, to control camerasettings (frame rate, image quality, sensitivity, etc.), or the like.Such signals can be transmitted, e.g., in response to control signalsfrom processor 206, which can in turn be generated in response to userinput or other detected events.

Sensor interface 242 can also include controllers 243, 246, to whichlight sources (e.g., light sources 108, 110) can be connected. In someimplementations, controllers 243, 246 provide operating current to thelight sources, e.g., in response to instructions from processor 206executing mocap program 218. In other implementations, the light sourcescan draw operating current from an external power supply, andcontrollers 243, 246 can generate control signals for the light sources,e.g., instructing the light sources to be turned on or off or changingthe brightness. In some implementations, a single controller can be usedto control multiple light sources.

Instructions defining mocap program 218 are stored in memory 208, andthese instructions, when executed, perform motion-capture analysis onimages supplied from cameras connected to sensor interface 242. In oneimplementation, mocap program 218 includes various modules, such as anobject detection module 228, an object analysis module 238, and agesture-recognition module 248. Object detection module 228 can analyzeimages (e.g., images captured via sensor interface 242) to detect edgesof an object therein and/or other information about the object'slocation. Object analysis module 238 can analyze the object informationprovided by object detection module 228 to determine the 3D positionand/or motion of the object (e.g., a user's hand). Examples ofoperations that can be implemented in code modules of mocap program 218are described below. Memory 208 can also include other informationand/or code modules used by mocap program 218 such as augmented reality(AR) library 258 that serves as an image repository of virtual objectsand an application platform 268, which allows a user to interact withthe mocap program 218 using different applications like application 1(App1), application 2 (App2), and application N (AppN).

Display 202, speakers 212, keyboard 222, and mouse 232 can be used tofacilitate user interaction with computer system 200. In someimplementations, results of gesture capture using sensor interface 242and mocap program 218 can be interpreted as user input. For example, auser can perform hand gestures that are analyzed using mocap program218, and the results of this analysis can be interpreted as aninstruction to some other program executing on processor 206 (e.g., aweb browser, word processor, or other application). Thus, by way ofillustration, a user might use upward or downward swiping gestures to“scroll” a webpage currently displayed on display 202, to use rotatinggestures to increase or decrease the volume of audio output fromspeakers 212, and so on.

It will be appreciated that computer system 200 is illustrative and thatvariations and modifications are possible. Computer systems can beimplemented in a variety of form factors, including server systems,desktop systems, laptop systems, tablets, smart phones or personaldigital assistants, wearable devices, e.g., goggles, head-mounteddisplays (HMDs), wrist computers, and so on. A particular implementationcan include other functionality not described herein, e.g., wired and/orwireless network interfaces, media playing and/or recording capability,etc. In some implementations, one or more cameras can be built into thecomputer or other device into which the sensor is imbedded rather thanbeing supplied as separate components. Further, an image analyzer can beimplemented using only a subset of computer system components (e.g., asa processor executing program code, an ASIC, or a fixed-function digitalsignal processor, with suitable I/O interfaces to receive image data andoutput analysis results).

While computer system 200 is described herein with reference toparticular blocks, it is to be understood that the blocks are definedfor convenience of description and are not intended to imply aparticular physical arrangement of component parts. Further, the blocksneed not correspond to physically distinct components. To the extentthat physically distinct components are used, connections betweencomponents (e.g., for data communication) can be wired and/or wirelessas desired.

With reference to FIGS. 1 and 2 , the user performs a gesture that iscaptured by the cameras 102, 104 as a series of temporally sequentialimages. In other implementations, cameras 102, 104 can capture anyobservable pose or portion of a user. For instance, if a user walks intothe field of view near the cameras 102, 104, cameras 102, 104 cancapture not only the whole body of the user, but the positions of armsand legs relative to the person's core or trunk. These are analyzed by agesture-recognition module 248, which can be implemented as anothermodule of the mocap program 218. Gesture-recognition module 248 providesinput to an electronic device, allowing a user to remotely control theelectronic device and/or manipulate virtual objects, such asprototypes/models, blocks, spheres, or other shapes, buttons, levers, orother controls, in a virtual environment displayed on display 202. Theuser can perform the gesture using any part of her body, such as afinger, a hand, or an arm. As part of gesture recognition orindependently, the image analysis and motion capture system 106 candetermine the shapes and positions of the user's hand in 3D space and inreal time; see, e.g., U.S. Serial Nos. 61/587,554, 13/414,485,61/724,091, and 13/724,357 filed on Jan. 17, 2012, Mar. 7, 2012, Nov. 8,2012, and Dec. 21, 2012 respectively, the entire disclosures of whichare hereby incorporated by reference. As a result, the image analysisand motion capture system processor 206 can not only recognize gesturesfor purposes of providing input to the electronic device, but can alsocapture the position and shape of the user's hand in consecutive videoimages in order to characterize the hand gesture in 3D space andreproduce it on the display screen 202.

In one implementation, the gesture-recognition module 248 compares thedetected gesture to a library of gestures electronically stored asrecords in a database, which is implemented in the image analysis andmotion capture system 106, the electronic device, or on an externalstorage system. (As used herein, the term “electronically stored”includes storage in volatile or non-volatile storage, the latterincluding disks, Flash memory, etc., and extends to any computationallyaddressable storage media (including, for example, optical storage).)For example, gestures can be stored as vectors, i.e., mathematicallyspecified spatial trajectories, and the gesture record can have a fieldspecifying the relevant part of the user's body making the gesture;thus, similar trajectories executed by a user's hand and head can bestored in the database as different gestures so that an application caninterpret them differently.

Augmented Reality

Augmented reality (AR) generation system 106 includes a number ofcomponents for generating the AR environment 300B of FIG. 3B. The firstcomponent is a camera such as cameras 102 or 104 or other video input togenerate a digitized video image of the real-world or user-interactionregion. The camera can be any digital device that is dimensioned andconfigured to capture still or motion pictures of the real-world and toconvert those images to a digital stream of information that can bemanipulated by a computer. For example, cameras 102 or 104 can bedigital still cameras, digital video cameras, web cams, head-mounteddisplays, phone cameras, tablet personal computers, ultra-mobilepersonal computers, and the like.

The second component is a transparent, partially transparent, orsemi-transparent user interface such as display 202 (embedded in a usercomputing device like a wearable goggle 352 or a smartphone like 342 or382) that combines rendered 3D virtual imagery with a view of thereal-world, so that both are visible at the same time to a user. In someimplementations, the rendered 3D virtual imagery can be projected usingholographic, laser, stereoscopic, auto-stereoscopic, or volumetric 3Ddisplays.

FIG. 3A shows an example of rendered 3D virtual imagery that issuperimposed, as free-floating virtual modality 312, in the real-worldphysical space 360 depicted in FIG. 3B. FIG. 3A shows one implementationof virtual modality 312; hosting a virtual gird 332 that includesvirtual slips 302 stratified by virtual gridlines 322. In someimplementations, the virtual modality 312 can be created on the fly orcan be retrieved from a repository.

FIG. 3B illustrates one implementation of an augmented reality (AR)environment 300B created by instantiation of a free-floating virtualmodality 312 in a real-world physical space 360. In one implementation,computer-generated imagery, presented as free-floating virtual modality312, can be rendered in front of a user as reflections using real-timerendering techniques such as orthographic or perspective projection,clipping, screen mapping, rasterizing and transformed into the field ofview or current view space 360 of a live camera embedded in a videoprojector, holographic projection system, smartphone 342 or 382,wearable goggle 352 or other head-mounted display (HMD), or heads-updisplay (HUD). In some other implementations, transforming models intothe current view space 360 can be accomplished using sensor output fromonboard sensors. For example, gyroscopes, magnetometers and other motionsensors can provide angular displacements, angular rates and magneticreadings with respect to a reference coordinate frame, and that data canbe used by a real-time onboard rendering engine to generate 3D imageryof virtual grid 332. If the user physically moves a user computingdevice 342, 352, or 382, resulting in a change of view of the embeddedcamera, the virtual modality 312 and computer-generated imagery can beupdated accordingly using the sensor data.

In some implementations, virtual modality 312 can include a variety ofinformation from a variety of local or network information sources. Someexamples of information include specifications, directions, recipes,data sheets, images, video clips, audio files, schemas, user interfaceelements, thumbnails, text, references or links, telephone numbers, blogor journal entries, notes, part numbers, dictionary definitions, catalogdata, serial numbers, order forms, marketing or advertising, iconsassociated with objects managed by an OS, and any other information thatmay be useful to a user. Some examples of information resources includelocal databases or cache memory, network databases, websites, onlinetechnical libraries, other devices, or any other information resourcethat can be accessed by user computing devices 342, 352, or 382 eitherlocally or remotely through a communication link.

Virtual items can include text, images, or references to otherinformation (e.g., links). In one implementation, interactive virtualitems can be displayed proximate to their corresponding real-worldobjects. In another implementation, interactive virtual items candescribe or otherwise provide useful information about the objects to auser. In the example shown, virtual grid 332 includes a collection ofvirtual slips 302 that represent real-world sticky or post-it notes.Additional related information, such as the manufacturer and part numbercan be included in the balloon callouts.

Some other implementations include the interactive virtual itemsrepresenting other and/or different real-world products such asfurniture (chairs, couches, tables, etc.), kitchen appliances (stoves,refrigerators, dishwashers, etc.), office appliances (copy machines, faxmachines, computers), consumer and business electronic devices(telephones, scanners, etc.), furnishings (pictures, wall hangings,sculpture, knick-knacks, plants), fixtures (chandeliers and the like),cabinetry, shelving, floor coverings (tile, wood, carpets, rugs), wallcoverings, paint colors, surface textures, countertops (laminate,granite, synthetic countertops), electrical and telecommunication jacks,audio-visual equipment, speakers, hardware (hinges, locks, door pulls,door knobs, etc.), exterior siding, decking, windows, shutters,shingles, banisters, newels, hand rails, stair steps, landscaping plants(trees, shrubs, etc.), and the like, and qualities of all of these (e.g.color, texture, finish, etc.).

Virtual modality 312 can generate for display virtual grid 332automatically or in response to trigger events. For example, the virtualgrid 332 may only appear in virtual modality 312 when the user selectsan icon or invokes an application presented across the user computingdevices 342, 352, or 382 and/or virtual modality 312. In otherimplementations, virtual modality 312 can be generated using a series ofunique real-world markers. The markers can be of any design, including acircular, linear, matrix, variable bit length matrix, multi-levelmatrix, black/white (binary), gray scale patterns, and combinationsthereof. The markers can be two-dimensional or three-dimensional. Themarkers can be two- or three-dimensional barcodes, or two- orthree-dimensional renderings of real-world, three-dimensional objects.For example, the markers can be thumbnail images of the virtual imagesthat are matched to the markers. The marker can also be an image of areal-world item which the software has been programmed to recognize. So,for example, the software can be programmed to recognize a sticky noteor other item from a video stream of a book. The software thensuperimposes an interactive virtual item in place of the real-worldsticky note. Each unique real-world marker corresponds to an interactivevirtual item, or a quality of an interactive virtual item (e.g. theobject's color, texture, opacity, adhesiveness, etc.) or both theinteractive virtual item 302 itself and all (or a subset) of thequalities of the interactive virtual item.

The AR generation system 106 further uses an AR library 258 that servesas an image repository or database of interactive virtual items, acomputer 200 that can selectively search and access the library 258, anda display 202 (embedded within a smartphone 342 or 382 or a virtualreality headset 352) or a projector, which are dimensioned andconfigured to display the real-world digital image captured by thecamera, as well as interactive virtual items retrieved from the ARlibrary 258. In some implementations, computer 200 includes a search andreturn engine that links each unique real-world marker to acorresponding interactive virtual item in the AR library 258.

In operation, the camera returns a digital video stream of thereal-world, including images of one or more of the markers describedpreviously. Image samples are taken from the video stream and passed tothe computer 200 for processing. The search and return engine thensearches the AR library 258 for the interactive virtual items thatcorrespond to the marker images contained in the digital video stream ofthe real-world. Once a match is made between a real-world markercontained in the digital video stream and the AR library 258, the ARlibrary 258 returns one or more interactive virtual items, theirqualities, and their orientation to the display 202 of one of the usercomputing devices 342, 352, 382, or 392. The interactive virtual itemsare then superimposed upon the real-world image. The interactive virtualitem is placed into the real-world image registration with itscorresponding marker. In other implementations, multiple markers can beused to position and orient a single interactive virtual item. Forexample, twenty-five unique markers could be used to construct thevirtual grid 332, which includes twenty-five virtual slips 302. In yetother implementations, a “markerless” AR experience can be generated byidentifying features of the surrounding real-world physical environmentvia sensors such as gyroscopes, accelerometers, compasses, and GPS datasuch as coordinates.

Projected AR allows users to simultaneously view the real-world physicalspace 360 and the interactive virtual items (e.g. virtual modality 312,virtual grid 332, or virtual slips 302) superimposed in the space 360.In one implementation, these interactive virtual items can be projectedon to the real-world physical space 360 using micro-projectors embeddedin wearable goggle 352 or other head-mounted display (HMD) that cast aperspective view of a stereoscopic 3D imagery onto the real-worldphysical space 360. In such an implementation, a camera, in-between themicro-projectors can scan for infrared identification markers placed inthe real-world physical space 360. The camera can use these markers toprecisely track the user's head position and orientation in thereal-world physical space 360, according to another implementation. Yetanother implementation includes using retro-reflectors in the real-worldphysical space 360 to prevent scattering of light emitted by themicro-projectors and to provision multi-user participation bymaintaining distinct and private user views. In such an implementation,multiple users can simultaneously interact with the same virtualmodality, such that they both view the same virtual objects andmanipulations to virtual objects by one user are seen by the other user.

In other implementations, projected AR obviates the need of usingwearable hardware such as goggles and other hardware like displays tocreate an AR experience. In such implementations, a video projector,volumetric display device, holographic projector, and/or heads-updisplay can be used to create a “glasses-free” AR environment. See e.g.,holographic chip projectors available from Ostendo, a companyheadquartered in Carlsbad, California(online.wsj.com/articles/new-chip-to-bring-holograms-to-smartphones-1401752938).In one implementation, such projectors can be electronically coupled touser computing devices such as smartphones 342 or 382 or laptop 392 andconfigured to produce and magnify virtual items (e.g. virtual modality312, virtual grid 332, or virtual slips 302) that are perceived as beingoverlaid on the real-world physical space 360.

The third component is a control and image-processing system 106, whichcaptures a series of sequentially temporal images of a region ofinterest. It further identifies any gestures performed in the region ofinterest and controls responsiveness of the rendered 3D virtual imageryto the performed gestures by updating the 3D virtual imagery based onthe corresponding gestures.

Gestural Interactions

As discussed above, one or more user-interface components in usercomputing devices 342, 352, or 382 can be used to present virtual grid332 to a user via a visual display (e.g., a thin-film-transistordisplay, liquid crystal display, or organic light-emitting-diodedisplay) and/or an audio speaker. In one implementation, user-interfacecomponents can receive information from the user through a touchscreen,buttons, scroll component (e.g., a movable or virtual ring component),microphone, and/or camera (e.g., to detect gestures).

As shown in FIG. 3B, a user can interact with a virtual modality 312 byperforming a gesture with a hand 372 and/or and other body movements. Inone implementation, pure gestures, or gestures in combination with voicerecognition, and/or a virtual or real keyboard in combination with thegestures can be used to select a virtual slip in virtual grid 332. Inanother implementation, a control console that recognizes gestures canbe used to control the virtual modality 312. In yet anotherimplementation, a user can use a pure gesture with the hand 372 or acombination 362 of a gesture and a held tool or other object to navigate(e.g., tilting, zooming, panning, moving) the 3D imagery hosted by thevirtual modality 312.

In some implementations, a user can raise an arm, utter a verbalcommand, perform an optical command, or make different poses using handsand fingers (e.g., ‘one finger point’, ‘one finger click’, ‘two fingerpoint’, ‘two finger click’, ‘prone one finger point’, ‘prone one fingerclick’, ‘prone two finger point’, ‘prone two finger click’, ‘medial onefinger point’, ‘medial two finger point’) to select a particular virtualslip in virtual grid 332. In other implementations, a point and graspgesture can be used to move a cursor on virtual modality 312, verbalcommands can be used to select a function, application, or program, eyemovements can be used to move a cursor, and blinking can indicate aselection.

These gestures allow a user to manipulate the computer-generated virtualobjects superimposed in the real-world space 360. In one implementation,the user can move his or her hand 372 underneath a virtual object (e.g.virtual slip 302) to scoop it up in the palm of their hand and move thevirtual object (e.g. virtual slip 302) from one location to another. Inanother implementation, the user can use a tool to enhance the graphicsof the virtual object (e.g. virtual slip 302). In yet anotherimplementation, manipulations can be based on physical-simulated virtualforces (e.g., virtual gravity, virtual electromagnetism, virtualimpulses, virtual friction, virtual charisma, virtual stacking (placingvirtual objects inside one another), etc.) enabling interactions withvirtual objects over distances. For example, a “gravity grab”interaction in an astronomy genre gaming engine or physics teachingimplementations includes emulating the force of gravity by selecting afunction in which the strength is proportional to a “virtual mass” ofthe virtual object but declines with the square of the distance betweenthe hand and the virtual object. In implementations employing strengthto emulate virtual properties of objects, virtual flexibility/rigidityenable interactions with virtual objects emulating one type of materialto have different interactions than virtual objects emulating anothertype of material. For example, a virtual steel sphere can behavedifferently to a “squeeze” gesture than a virtual rubber sphere. Virtualproperties (e.g., virtual mass, virtual distance, virtualflexibility/rigidity, etc.) and virtual forces (e.g., virtual gravity,virtual electromagnetism, virtual charisma, etc.), like virtual objects,can be created (i.e., having no analog in the physical world) or modeled(i.e., having an analog in the physical world). Normal vectors orgradients can be used in some other implementations. In yet otherimplementations, the virtual objects can be rendered around the user'shand such that the computer-generated imagery moves in conjunction withand synchronously with the performance of the gestures.

FIG. 4A is one implementation of navigating 400A a virtual modality 432using a vertical sweep 422. Virtual modality 432 includes a collectionof virtual slips arranged as a virtual grid 444 that are rendered to auser as free-floating computer-generated imagery in the user's field ofview. In another implementation, a vertical sweep 422 of a controlobject 412 is detected responsive to a first control gesture 412A-412Bin a three-dimensional (3D) sensory space. As shown in FIG. 4A, thevertical sweep 422 starts when the hand 412A is at the top right cornerof the virtual modality 432 and is followed by a downward sweep thatresults in the hand 412B being at the bottom right corner of the virtualmodality 432. This defines a vertical extent of translation 402 along afirst axis of a virtual grid 444 in proportion to length of the verticalsweep 422 of the control object 412. In other implementations, thevertical sweep is an upward sweep. AR generation system 106 uses thedefined vertical extent of translation 402 to accordingly alter theon-screen responsiveness of the virtual grid 444 and/or its contentscorresponding to motion of the hand 412.

FIG. 4B illustrates one implementation of navigating the virtualmodality 432 using a horizontal sweep 442 and selecting 400B a virtualobject (e.g. virtual slip 462) in the virtual modality 432. In anotherimplementation, a horizontal sweep 442 of the control object 412 isdetected responsive to a second control gesture 412C-412D in the 3Dsensory space. As illustrated in FIG. 4B, the horizontal sweep 442starts when the hand 412C is at the right corner of the virtual modality432 and is followed by a leftward sweep that results in the hand 412Dbeing at the left corner of the virtual modality 432. This defines ahorizontal extent of translation 452 along a second axis of a virtualgrid 444 in proportion to length of the vertical sweep 422 of thecontrol object 412, such that the second axis is perpendicular to thefirst axis. In other implementations, the horizontal sweep is arightward sweep. AR generation system 106 uses the defined horizontalextent of translation 452 to accordingly alter the on-screenresponsiveness of the virtual grid 444 and/or its contents correspondingto motion of the hand 412.

Further, a virtual item (e.g. virtual slip 462) in the virtual grid 444is automatically selected at a terminal end or terminal gesture 412E ofthe horizontal sweep. In one implementation, terminal gesture 412E is aflick of a whole hand, flick of one of individual fingers or thumb of ahand, or flick of a set of bunched fingers or bunched fingers and thumbof a hand. In some implementations, selection of the virtual item (e.g.virtual slip 462) can be indicated by modifying a presentation propertyof the virtual item (e.g. virtual slip 462), including changing at leastone of position, orientation, color, size, shape, texture, andtransparency of at least a portion of the virtual item (e.g. virtualslip 462). As depicted in FIG. 4B, AR generation system 106 updatesgraphics of the virtual slip 462A upon selection and further removes thevirtual slip 462 from one virtual location 462A and posts it to anothervirtual location 462B (as illustrated by transition 472).

FIG. 5A shows one implementation of identifying 500A, for selection, aparticular virtual object (e.g. virtual slip or note 522) in a virtualmodality 532 by placing a hand 512 behind or underneath the particularvirtual object (e.g. virtual slip 522). As illustrated in FIG. 5A, aparticular virtual note 522 is identified for selection responsive topositioning a hand 512 behind or underneath the particular virtual note522.

FIG. 5B illustrates one implementation of selecting 500B the particularvirtual object (e.g. virtual slip 522) in a virtual modality 532 using ascooping gesture 542. As shown in FIG. 5B, the particular virtual note(e.g. virtual slip 522) is selected responsive to a transition ofphysical arrangement of the hand 512 from a resting position 512A behindthe particular virtual note (e.g. virtual slip 522) to an inward scoop512B towards the particular virtual note (e.g. virtual slip 522). In yetother implementations, a selected virtual note (e.g. virtual slip 522)is deselected responsive to an outward scoop hand gesture.

In some implementations, selection of the virtual item (e.g. virtualslip 522) can be indicated by modifying a presentation property of thevirtual item (e.g. virtual slip 522), including changing at least one ofposition, orientation, color, size, shape, texture, and transparency ofat least a portion of the virtual item (e.g. virtual slip 522). Asdepicted in FIG. 5B, AR generation system 106 updates graphics of thevirtual slip 522A upon selection and further brings the virtual slip522A in forefront 522B. In other implementations, the AR generationsystem 106 creates different depth layers in different portions of thevirtual modality 532 and maintains the transition between depth layers(e.g. when the user grasps the virtual slip 522A or places it his or herpalm and raises the virtual slip 522A), thus simulating on-screenresponsiveness of the virtual objects (e.g. virtual slip 522) similar tothe real-world objects.

FIG. 6 illustrates one implementation of selecting 600 a particularvirtual object (e.g. virtual slip 652) in a virtual modality 632 inresponse to a transition of physical arrangement of a hand 602 fromclenched first 602A-602C to open hand 602D. As illustrated in FIG. 6 , avertical extent of navigation 612 is defined along a first axis inproportion to length of a vertical sweep of the hand 602 in the 3Dsensory space. Also, a horizontal extent of navigation 622 is definedalong a second axis in proportion to length of a horizontal sweep of thehand 602 in the 3D sensory space, such that the second axis isperpendicular to the first axis. Further, a virtual item (e.g. virtualslip 652) is automatically selected at a terminal end of the horizontalsweep responsive to a transition of physical arrangement 642 of the hand602 from clenched first 602C to open hand 602D.

In some implementations, selection of the virtual item (e.g. virtualslip 652) can be indicated by modifying a presentation property of thevirtual item (e.g. virtual slip 652), including changing at least one ofposition, orientation, color, size, shape, texture, and transparency ofat least a portion of the virtual item (e.g. virtual slip 652). Asdepicted in FIG. 6 , AR generation system 106 updates graphics of thevirtual slip 652 upon selection.

FIG. 7 is one implementation of selecting 700 a particular virtualobject (e.g. virtual slip 752) in a virtual modality 732 responsive to atransition of physical arrangement of a hand 702 from flat-hand hoveringgesture 702A-702C to pinching gesture 702D. As shown in FIG. 7 , aparticular virtual note (e.g. virtual slip 752) is identified forselection responsive to flat-hand hovering 702A-702C of the hand 702above the particular virtual note (e.g. virtual slip 752). A verticalextent of navigation 712 is defined along a first axis in proportion tolength of a vertical sweep of the hand 702 in the 3D sensory space.Also, a horizontal extent of navigation 722 is defined along a secondaxis in proportion to length of a horizontal sweep of the hand 702 inthe 3D sensory space, such that the second axis is perpendicular to thefirst axis. Further, the particular virtual note (e.g. virtual slip 752)is selected responsive to a transition of physical arrangement 742 ofthe hand 702 from flat-hand hovering 702A-702C above the particularvirtual note (e.g. virtual slip 752) to pinching 702D of its thumb andone or more fingers towards the particular virtual note (e.g. virtualslip 752). In yet other implementations, a selected virtual note (e.g.virtual slip 752) is deselected responsive to an expanding hand gesture.

In some implementations, selection of the virtual item (e.g. virtualslip 752) can be indicated by modifying a presentation property of thevirtual item (e.g. virtual slip 752), including changing at least one ofposition, orientation, color, size, shape, texture, and transparency ofat least a portion of the virtual item (e.g. virtual slip 752). Asdepicted in FIG. 7 , AR generation system 106 updates graphics of thevirtual slip 752 upon selection.

FIG. 8 depicts one implementation of selecting 800 a particular virtualobject (e.g. virtual slip 852) in a virtual modality 832 responsive to atransition of physical arrangement of a hand 802 from one-fingerpointing 802A-802C to immediate opening of the hand 802D. As shown inFIG. 8 , a particular virtual note (e.g. virtual slip 852) is identifiedfor selection responsive to a one-finger point gesture of the hand802A-802C towards the particular virtual note (e.g. virtual slip 852). Avertical extent of navigation 812 is defined along a first axis inproportion to length of a vertical sweep of the hand 802 in the 3Dsensory space. Also, a horizontal extent of navigation 822 is definedalong a second axis in proportion to length of a horizontal sweep of thehand 802 in the 3D sensory space, such that the second axis isperpendicular to the first axis. Further, the particular virtual note(e.g. virtual slip 852) is selected responsive to a transition ofphysical arrangement 842 of the hand 802 from one-finger point gestureof the hand 802A-802C towards the particular virtual note (e.g. virtualslip 852) to immediate opening of the hand 802D above the particularvirtual note (e.g. virtual slip 852). In yet other implementations, aselected virtual note (e.g. virtual slip 852) is deselected responsiveto clenching of the hand 802D.

In some implementations, selection of the virtual item (e.g. virtualslip 852) can be indicated by modifying a presentation property of thevirtual item (e.g. virtual slip 852), including changing at least one ofposition, orientation, color, size, shape, texture, and transparency ofat least a portion of the virtual item (e.g. virtual slip 852). Asdepicted in FIG. 8 , AR generation system 106 updates graphics of thevirtual slip 852 upon selection.

FIG. 9 shows one implementation of selecting 900 a particular virtualobject (e.g. virtual slip 952) in a virtual modality 932 responsive to atransition of physical arrangement of a hand 902 from flat-hand hoveringgesture 902A-902C to curling gesture 902D. As shown in FIG. 9 , aparticular virtual note (e.g. virtual slip 952) is identified forselection responsive to flat-hand hovering 902A-902C of the hand 902above the particular virtual note (e.g. virtual slip 952). A verticalextent of navigation 912 is defined along a first axis in proportion tolength of a vertical sweep of the hand 902 in the 3D sensory space.Also, a horizontal extent of navigation 922 is defined along a secondaxis in proportion to length of a horizontal sweep of the hand 902 inthe 3D sensory space, such that the second axis is perpendicular to thefirst axis. Further, the particular virtual note (e.g. virtual slip 952)is selected responsive to a transition of physical arrangement 942 ofthe hand 902 from flat-hand hovering 902A-902C above the particularvirtual note (e.g. virtual slip 952) to curling of its thumb and fingers902D above the particular virtual note (e.g. virtual slip 952). In yetother implementations, a selected virtual note (e.g. virtual slip 952)is deselected responsive to an expanding of thumb and fingers of thehand 902D.

In some implementations, selection of the virtual item (e.g. virtualslip 952) can be indicated by modifying a presentation property of thevirtual item (e.g. virtual slip 952), including changing at least one ofposition, orientation, color, size, shape, texture, and transparency ofat least a portion of the virtual item (e.g. virtual slip 952). Asdepicted in FIG. 9 , AR generation system 106 updates graphics of thevirtual slip 952 upon selection.

FIG. 10 is one implementation of selecting 1000 a particular virtualobject (e.g. virtual slip 1052) in a virtual modality 1032 responsive toa transition of physical arrangement of a hand 1002 from bunched-fingers1002A-1002C to spreading apart of the fingers 1002D and to immediatebunching of the fingers 1002E. As shown in FIG. 10 , a particularvirtual note (e.g. virtual slip 1052) is identified for selectionresponsive to bunched-fingers hovering 1002A-1002C of the hand 1002above the particular virtual note (e.g. virtual slip 1052). A verticalextent of navigation 1012 is defined along a first axis in proportion tolength of a vertical sweep of the hand 1002 in the 3D sensory space.Also, a horizontal extent of navigation 1022 is defined along a secondaxis in proportion to length of a horizontal sweep of the hand 1002 inthe 3D sensory space, such that the second axis is perpendicular to thefirst axis. Further, the particular virtual note (e.g. virtual slip1052) is selected responsive to a transition of physical arrangement1042 of the hand 1002 from bunched-fingers hovering 1002C above theparticular virtual note (e.g. virtual slip 1052) to spreading apart ofthe fingers 1002D above the particular virtual (e.g. virtual slip 1052)and to immediate bunching of the fingers 1002E (e.g. transition 1062).In yet other implementations, a selected virtual note (e.g. virtual slip1052) is deselected responsive to a spreading apart of the fingers abovethe particular virtual note (e.g. virtual slip 1052) without asubsequent immediate bunching of the fingers.

In some implementations, selection of the virtual item (e.g. virtualslip 1052) can be indicated by modifying a presentation property of thevirtual item (e.g. virtual slip 1052), including changing at least oneof position, orientation, color, size, shape, texture, and transparencyof at least a portion of the virtual item (e.g. virtual slip 1052). Asdepicted in FIG. 10 , AR generation system 106 updates graphics of thevirtual slip 1052 upon selection.

FIG. 11 is one implementation of selecting 1100 a particular virtualobject (e.g. virtual slip 1152) in a virtual modality 1132 responsive toa transition of physical arrangement of a hand 1102 from flat-handhovering gesture 1102A-1102C to okay gesture 1102D. As shown in FIG. 11, a particular virtual note (e.g. virtual slip 1152) is identified forselection responsive to flat-hand hovering 1102A-1102C of the hand 1102above the particular virtual note (e.g. virtual slip 1152). A verticalextent of navigation 1112 is defined along a first axis in proportion tolength of a vertical sweep of the hand 1102 in the 3D sensory space.Also, a horizontal extent of navigation 1122 is defined along a secondaxis in proportion to length of a horizontal sweep of the hand 1102 inthe 3D sensory space, such that the second axis is perpendicular to thefirst axis. Further, the particular virtual note (e.g. virtual slip1152) is selected responsive to a transition of physical arrangement1142 of the hand 1102 from flat-hand hovering 1102A-1102C above theparticular virtual note (e.g. virtual slip 1152) to curling of its thumband index finger 1102D above the particular virtual (e.g. virtual slip1152) and vertical expansion of other fingers (i.e. okay gesture). Inyet other implementations, a selected virtual note (e.g. virtual slip1152) is deselected responsive to a paralleling of the thumb and thefingers of the hand 1102.

In some implementations, selection of the virtual item (e.g. virtualslip 1152) can be indicated by modifying a presentation property of thevirtual item (e.g. virtual slip 1152), including changing at least oneof position, orientation, color, size, shape, texture, and transparencyof at least a portion of the virtual item (e.g. virtual slip 1152). Asdepicted in FIG. 11 , AR generation system 106 updates graphics of thevirtual slip 1152 upon selection.

FIGS. 12A-12C illustrate one implementation of generating for display aproximity indicator 1222A-1222C for a hand 1232 and automaticallyselecting 1200A-1200C a particular virtual object (e.g. virtual slip1212) in a virtual modality 1202 when the hand approaches the virtualobject (e.g. virtual slip 1212) within an initial hover proximitythreshold. In one implementation, a proximity indicator 1222A-1222C isgenerated for display that provides visual feedback regarding proximity(e.g. distances d1, d2) of the hand 1232 to a particular virtual note(e.g. virtual slip 1212) and escalation from proximity 1232A-1232B tocontact 1232C of the hand 1232 with the particular virtual note (e.g.virtual slip 1212).

Custom logic for proximity indicator 1222 is defined such that proximityindicator 1222A-1222C is larger in size when the hand 1232 is fartheraway at distance d1 from the virtual slip 1212 than it is closer atdistance d2, thus being proportionally responsive to the distancebetween the hand 1232 and the virtual slip 1212. At action 1200A in FIG.12A, hand 1232A is at an initial distance d1 from the virtual slip 1212and thus the proximity indicator 1222 is of an initial size 1222A. Ataction 1200B in FIG. 12B, as the hand 1232 approaches the virtual slip1212 at 1232B to a distance d2, the proximity indicator 1222 modifies toa smaller size 1222B. Further, when the hand 1232 comes in contact withthe virtual slip 1212 at 1232C in FIG. 12C, the proximity indicator 1222is updated by the AR generation system 106 to an even further smallersize 1222C and the virtual slip 1212 is automatically selected inresponse. In other implementations, the virtual slip 1212 isautomatically selected when the virtual slip 1212 within an initialhover proximity threshold of the hand 1232.

In some implementations, selection of the virtual item (e.g. virtualslip 1212) can be indicated by modifying a presentation property of thevirtual item (e.g. virtual slip 1212), including changing at least oneof position, orientation, color, size, shape, texture, and transparencyof at least a portion of the virtual item (e.g. virtual slip 1212). Asdepicted in FIGS. 12A-12B, AR generation system 106 updates graphics ofthe virtual slip 1212 upon selection. In other implementations, thegenerated display further includes modifying at least one or combinationof appearance, shape, or opacity of the proximity indicator 1222responsive to distance between the hand 1232 and the particular virtualslip 1212.

FIGS. 13A-13B depict one implementation of selecting 1300A-1300B aparticular virtual object (e.g. virtual slip 1313) in a virtual modality1302 responsive to firing 1332B of finger gun 1332A. As shown in FIGS.13A-13B, a particular virtual note (e.g. virtual slip 1313) isidentified for selection responsive to a one-finger point 1332A of thefinger gun 1332 towards the particular virtual note 1313. Further, theparticular virtual note (e.g. virtual slip 1313) is selected responsiveto a transition of physical arrangement of the finger gun 1332 fromone-finger pointing 1332A towards the particular virtual note 1313 toinward curling of a finger 1342 used to perform the one-finger pointing1332A. In one implementation, the one-finger pointing is performed usingan index finger. In other implementations, a selected virtual note 1313is deselected responsive to outward curling of the finger used toperform the one-finger pointing. In yet other implementations, theparticular virtual note (e.g. virtual slip 1313) is selected responsiveto a transition of physical arrangement of the finger gun 1332 fromone-finger pointing 1332A towards the particular virtual note 1313 toinward curling of a thumb 1352 of the finger gun 1332.

In some implementations, selection of the virtual item (e.g. virtualslip 1313) can be indicated by modifying a presentation property of thevirtual item (e.g. virtual slip 1313), including changing at least oneof position, orientation, color, size, shape, texture, and transparencyof at least a portion of the virtual item (e.g. virtual slip 1313). Asdepicted in FIG. 13B, AR generation system 106 updates graphics of thevirtual slip 1313 upon selection.

FIGS. 14A-14B are one implementation of selecting 1400A-1400B aparticular virtual object (e.g. virtual slip 1414) in response torotation 1432B of a hand 1432A. As shown in FIGS. 14A-14B, a particularvirtual note (e.g. virtual slip 1414) is identified for selectionresponsive to a prone flat-hand hovering 1432A of a hand 1432 above theparticular virtual note 1414 in the virtual modality 1402. Further, theparticular virtual note (e.g. virtual slip 1414) is selected responsiveto a transition of physical arrangement of the hand 1432 from pronation1432A illustrated in FIG. 14A to supination 1432B illustrated in FIG.14B when the hand 1432 turns from a prone position 1432A to a supineposition 1432B while hovering over the particular virtual note 1414. Inother implementations, a selected virtual note 1414 is deselectedresponsive to supination of the prone flat-hand hovering over theparticular virtual note 1414.

In some implementations, selection of the virtual item (e.g. virtualslip 1414) can be indicated by modifying a presentation property of thevirtual item (e.g. virtual slip 1414), including changing at least oneof position, orientation, color, size, shape, texture, and transparencyof at least a portion of the virtual item (e.g. virtual slip 1414). Asdepicted in FIG. 14B, AR generation system 106 updates graphics of thevirtual slip 1414 upon selection.

Virtual Reality Operating System

The technology disclosed provides a rare opportunity to achieve aquantum leap in human computer interaction by combining virtual realityinterfaces with gestural inputs. The traditional paradigms of indirectinteractions through standard input devices such as mouse, keyboard, orstylus have their limitations, including skewed fields of view andrestrictively receptive interfaces. The technology disclosed presents avirtual paradigm that can be used to created 3D user interfaces layers,applications, programs, operating system APIs, which mimic and areanalogous to pre-existing “windows, icons, menus, pointer” (WIMP)interactions and operating system kernel.

In one implementation, a user can instantiate free-floating virtualinterfaces, called “virtual modalities” such as screens and panels andthen interacts with them using free-form gestures, as described above inthis application. The technology disclosed allows a user to create anynumber of these virtual modalities and to assign them any dimension,size, shape, color, or orientation. In another implementation, thesevirtual modalities can be manipulated extrinsically via gestures suchthat the user can move them in the real-world space, close or removethem, leave them running, bring them in forefront, split them, stack oneover the other, or arrange them in a pattern or formation.

The technology disclosed further allows users to intrinsically operate avirtual desktop hosted by the virtual modalities in intuitive ways usinggestures. For example, gestures can be used to perform traditionalmanipulations of virtual files, folders, text editors, spreadsheets,databases, paper sheets, recycling bin, windows, or clipboards thatrepresent their pre-existing counterparts. Such manipulations caninclude—the user picking up a virtual object and bringing it to theirdesired destination, running searches or flipping through with theirhands and find what is need, trashing unwanted virtual items by pickingthem and dropping them into the virtual recycling bin, pointing towardsvirtual song files to be played, pulling a blank virtual paper and begintyping, pulling-down a virtual menu, selecting a virtual icon, rotatinga 3D image for 360 degree inspection, moving forward into the windowsenvelope with a forward sweep, moving backward into the windows envelopewith a backward sweep, bringing in contact a first file icon with anapplication or program icon using a two-hand inward swipe to open thecorresponding file with the application or program, and the like.

Flowcharts

FIG. 15 illustrates one implementation of a method 1500 of selecting avirtual item from a virtual grid in a three-dimensional (3D) sensoryspace. At action 1502, a virtual grid is generated, optionally with aplurality of grid lines, and corresponding plurality of virtual itemsresponsive to gestures in a three-dimensional (3D) sensory space. In oneimplementation, each virtual item is in visual correspondence with adifferent set of gridlines. In other implementations, the virtual gridis at least one of polygonic, circular, and globate.

At action 1512, a gesture is detected in the 3D sensory space andinterpreted as selecting one of the virtual items.

At action 1522, the selection is automatically reported to a furthercomputer-implemented process.

At action 1532, one or more applications linked to a virtual item in thevirtual grid are invoked responsive to selection of the virtual item bythe detected gesture.

At action 1542, the presentation of the virtual item is modified,including changing at least one of position, orientation, color, size,shape, texture, and transparency of at least a portion of the virtualitem. In one implementation, modifying the presentation of the virtualitem includes augmenting the virtual item with additional graphics.

This method and other implementations of the technology disclosed caninclude one or more of the following features and/or features describedin connection with additional methods disclosed. Other implementationsmay include a non-transitory computer readable storage medium storinginstructions executable by a processor to perform any of the methodsdescribed above. Yet another implementation may include a systemincluding memory and one or more processors operable to executeinstructions, stored in the memory, to perform any of the methodsdescribed above.

FIG. 16 is a flowchart showing a method 1600 of navigating a virtualmodality displaying a plurality of virtual items arranged in a grid. Ataction 1602, a first sweep of a control object is detected responsive toa first control gesture in a three-dimensional (3D) sensory space.

At action 1612, an extent of translation is defined along a first axisof a virtual grid in proportion to length of the first sweep of thecontrol object.

At action 1622, a second sweep of the control object is detectedresponsive to a second control gesture in the 3D sensory space.

At action 1632, an extent of translation is defined along a second axisof the virtual grid in proportion to length of the second sweep of thecontrol object, wherein the second axis is perpendicular to the firstaxis.

At action 1642, a virtual item in the virtual grid is automaticallyselected at a terminal end of the second sweep.

In one implementation, the first sweep is a horizontal sweep. In anotherimplementation, the second sweep is a vertical sweep. In yet anotherimplementation, the first sweep is a vertical sweep.

In one implementation, the second sweep is a horizontal sweep. Inanother implementation, the first sweep is a horizontal sweep. In yetanother implementation, the second sweep is a diagonal sweep.

In one implementation, the first sweep is a diagonal sweep. In anotherimplementation, the second sweep is a horizontal sweep. In yet anotherimplementation, the first sweep is a vertical sweep.

In one implementation, the second sweep is a diagonal sweep. In anotherimplementation, the first sweep is a diagonal sweep. In yet anotherimplementation, the second sweep is a vertical sweep.

In some implementations, a third sweep of the control object is detectedresponsive to a third control gesture in the 3D sensory space. Also, anextent of translation is defined along a third axis of the virtual gridin proportion to length of the second sweep of the control object,wherein the third axis is perpendicular to the first and second axis.Further, a virtual item in the virtual grid is automatically selected ata terminal end of the third sweep.

In one implementation, the third sweep is a horizontal sweep. In anotherimplementation, the third sweep is a vertical sweep. In yet anotherimplementation, the third sweep is a diagonal sweep.

This method and other implementations of the technology disclosed caninclude one or more of the following features and/or features describedin connection with additional methods disclosed. Other implementationsmay include a non-transitory computer readable storage medium storinginstructions executable by a processor to perform any of the methodsdescribed above. Yet another implementation may include a systemincluding memory and one or more processors operable to executeinstructions, stored in the memory, to perform any of the methodsdescribed above.

Particular Implementations

In one implementation, a method of selecting a virtual item from avirtual grid in a three-dimensional (3D) sensory space is described. Themethod includes generating a virtual grid with a plurality of grid linesand corresponding plurality of virtual items responsive to gestures in athree-dimensional (3D) sensory space, wherein each virtual item is invisual correspondence with a different set of gridlines, detecting agesture in the 3D sensory space and interpreting the gesture asselecting one of the virtual items, and automatically reporting theselection to a further computer-implemented process.

This method and other implementations of the technology disclosed caninclude one or more of the following features and/or features describedin connection with additional methods disclosed. In the interest ofconciseness, the combinations of features disclosed in this applicationare not individually enumerated and are not repeated with each base setof features. The reader will understand how features identified in thissection can readily be combined with sets of base features identified asimplementations such as gesture recognition system, computer system,augmented reality, gestural interactions, virtual reality operatingsystem, or flowcharts.

The method also includes invoking one or more applications linked to avirtual item in the virtual grid responsive to selection of the virtualitem by the detected gesture. It further includes modifying apresentation property of a virtual item in the virtual grid responsiveto selection of the virtual item by the detected gesture. In oneimplementation, modifying the presentation of the virtual item includeschanging at least one of position, orientation, color, size, shape,texture, and transparency of at least a portion of the virtual item. Inanother implementation, modifying the presentation of the virtual itemincludes augmenting the virtual item with additional graphics.

In one implementation, the virtual grid is polygonic. In anotherimplementation, the virtual grid is circular. In yet anotherimplementation, the virtual grid is globate.

The method further includes generating the virtual grid using at leastholographic chip projects.

This method can be implemented at least partially with a databasesystem, e.g., by one or more processors configured to receive orretrieve information, process the information, store results, andtransmit the results. Other implementations may perform the actions indifferent orders and/or with different, fewer or additional actions thanthose discussed. Multiple actions can be combined in someimplementations. For convenience, this method is described withreference to the system that carries out a method. The system is notnecessarily part of the method.

Other implementations can include a non-transitory computer readablestorage medium storing instructions executable by a processor to performany of the methods described above. Yet another implementation caninclude a system including memory and one or more processors operable toexecute instructions, stored in the memory, to perform any of themethods described above.

In another implementation, a method of navigating a virtual modalitydisplaying a plurality of virtual items arranged in a grid is described.The method includes detecting a first sweep of a control objectresponsive to a first control gesture in a three-dimensional (3D)sensory space, defining an extent of translation along a first axis of avirtual grid in proportion to length of the first sweep of the controlobject, detecting a second sweep of the control object responsive to asecond control gesture in the 3D sensory space, defining an extent oftranslation along a second axis of the virtual grid in proportion tolength of the second sweep of the control object, wherein the secondaxis is perpendicular to the first axis, and automatically selecting avirtual item in the virtual grid at a terminal end of the second sweep.

This method and other implementations of the technology disclosed caninclude one or more of the following features and/or features describedin connection with additional methods disclosed. In the interest ofconciseness, the combinations of features disclosed in this applicationare not individually enumerated and are not repeated with each base setof features. The reader will understand how features identified in thissection can readily be combined with sets of base features identified asimplementations such as gesture recognition system, computer system,augmented reality, gestural interactions, virtual reality operatingsystem, or flowcharts.

The method also includes invoking one or more applications linked to avirtual item in the virtual grid responsive to selection of the virtualitem by the detected gesture. It further includes modifying apresentation property of a virtual item in the virtual grid responsiveto selection of the virtual item by the detected gesture. In oneimplementation, modifying the presentation of the virtual item includeschanging at least one of position, orientation, color, size, shape,texture, and transparency of at least a portion of the virtual item. Inanother implementation, modifying the presentation of the virtual itemincludes augmenting the virtual item with additional graphics.

In one implementation, the first sweep is a horizontal sweep. In anotherimplementation, the second sweep is a vertical sweep.

In one implementation, the first sweep is a vertical sweep. In anotherimplementation, the second sweep is a horizontal sweep.

In one implementation, the first sweep is a horizontal sweep. In anotherimplementation, the second sweep is a diagonal sweep.

In one implementation, the first sweep is a diagonal sweep. In anotherimplementation, the second sweep is a horizontal sweep.

In one implementation, the first sweep is a vertical sweep. In anotherimplementation, the second sweep is a diagonal sweep.

In one implementation, the first sweep is a diagonal sweep.

In another implementation, the second sweep is a vertical sweep.

In one implementation, the method further includes generating thevirtual grid using at least holographic chip projects.

In another implementation, the method further include detecting a thirdsweep of the control object responsive to a third control gesture in the3D sensory space, defining an extent of translation along a third axisof the virtual grid in proportion to length of the second sweep of thecontrol object, wherein the third axis is perpendicular to the first andsecond axis, and automatically selecting a virtual item in the virtualgrid at a terminal end of the third sweep.

In one implementation, the third sweep is a horizontal sweep. In anotherimplementation, the third sweep is a vertical sweep. In yet anotherimplementation, the third sweep is a diagonal sweep.

This method can be implemented at least partially with a databasesystem, e.g., by one or more processors configured to receive orretrieve information, process the information, store results, andtransmit the results. Other implementations may perform the actions indifferent orders and/or with different, fewer or additional actions thanthose discussed. Multiple actions can be combined in someimplementations. For convenience, this method is described withreference to the system that carries out a method. The system is notnecessarily part of the method.

Other implementations can include a non-transitory computer readablestorage medium storing instructions executable by a processor to performany of the methods described above. Yet another implementation caninclude a system including memory and one or more processors operable toexecute instructions, stored in the memory, to perform any of themethods described above.

In yet another implementation, a method of navigating a virtual modalitydisplaying a plurality of virtual items arranged in a grid is described.The method includes detecting a horizontal sweep of a control objectresponsive to a first control gesture in a three-dimensional (3D)sensory space, defining a horizontal extent of translation along a firstaxis of a virtual grid in proportion to length of the horizontal sweepof the control object, detecting a vertical sweep of the control objectresponsive to a second control gesture in the 3D sensory space, defininga vertical extent of translation along a second axis of the virtual gridin proportion to length of the vertical sweep of the control object,wherein the second axis is perpendicular to the first axis, andautomatically selecting a virtual item in the virtual grid at a terminalend of the vertical sweep. This method and other implementations of thetechnology disclosed can include one or more of the following featuresand/or features described in connection with additional methodsdisclosed. In the interest of conciseness, the combinations of featuresdisclosed in this application are not individually enumerated and arenot repeated with each base set of features. The reader will understandhow features identified in this section can readily be combined withsets of base features identified as implementations such as gesturerecognition system, computer system, augmented reality, gesturalinteractions, virtual reality operating system, or flowcharts.

In one implementation, the horizontal sweep is a right gesture. Inanother implementation, the horizontal sweep is a left gesture. In yetanother implementation, the vertical sweep is an upward gesture. In afurther implementation, the vertical sweep is a downward gesture.

This method can be implemented at least partially with a databasesystem, e.g., by one or more processors configured to receive orretrieve information, process the information, store results, andtransmit the results. Other implementations may perform the actions indifferent orders and/or with different, fewer or additional actions thanthose discussed. Multiple actions can be combined in someimplementations. For convenience, this method is described withreference to the system that carries out a method. The system is notnecessarily part of the method.

Other implementations can include a non-transitory computer readablestorage medium storing instructions executable by a processor to performany of the methods described above. Yet another implementation caninclude a system including memory and one or more processors operable toexecute instructions, stored in the memory, to perform any of themethods described above.

In a further implementation, a method of navigating a virtual modalitydisplaying a plurality of virtual items arranged in a grid is described.The method includes detecting a vertical sweep of a control objectresponsive to a first control gesture in a three-dimensional (3D)sensory space, defining a vertical extent of translation along a firstaxis of a virtual grid in proportion to length of the vertical sweep ofthe control object, detecting a horizontal sweep of the control objectresponsive to a second control gesture in the 3D sensory space, defininga horizontal extent of translation along a second axis of the virtualgrid in proportion to length of the horizontal sweep of the controlobject, wherein the second axis is perpendicular to the first axis, andautomatically selecting a virtual item in the virtual grid at a terminalend of the horizontal sweep.

This method and other implementations of the technology disclosed caninclude one or more of the following features and/or features describedin connection with additional methods disclosed. In the interest ofconciseness, the combinations of features disclosed in this applicationare not individually enumerated and are not repeated with each base setof features. The reader will understand how features identified in thissection can readily be combined with sets of base features identified asimplementations such as gesture recognition system, computer system,augmented reality, gestural interactions, virtual reality operatingsystem, or flowcharts.

In one implementation, the horizontal sweep is a right gesture. Inanother implementation, the horizontal sweep is a left gesture. In yetanother implementation, the vertical sweep is an upward gesture. In afurther implementation, the vertical sweep is a downward gesture.

This method can be implemented at least partially with a databasesystem, e.g., by one or more processors configured to receive orretrieve information, process the information, store results, andtransmit the results. Other implementations may perform the actions indifferent orders and/or with different, fewer or additional actions thanthose discussed. Multiple actions can be combined in someimplementations. For convenience, this method is described withreference to the system that carries out a method. The system is notnecessarily part of the method.

Other implementations can include a non-transitory computer readablestorage medium storing instructions executable by a processor to performany of the methods described above. Yet another implementation caninclude a system including memory and one or more processors operable toexecute instructions, stored in the memory, to perform any of themethods described above.

In a further implementation, a method of navigating a virtual modalitydisplaying a plurality of virtual items arranged in a grid is described.The method includes detecting a vertical sweep of a control objectresponsive to a first control gesture in a three-dimensional (3D)sensory space, defining a vertical extent of translation along a firstaxis of a virtual grid in proportion to length of the vertical sweep ofthe control object, detecting a diagonal sweep of the control objectresponsive to a second control gesture in the 3D sensory space, defininga diagonal extent of translation along a second axis of the virtual gridin proportion to length of the diagonal sweep of the control object,wherein the second axis is perpendicular to the first axis, andautomatically selecting a virtual item in the virtual grid at a terminalend of the diagonal sweep.

This method and other implementations of the technology disclosed caninclude one or more of the following features and/or features describedin connection with additional methods disclosed. In the interest ofconciseness, the combinations of features disclosed in this applicationare not individually enumerated and are not repeated with each base setof features. The reader will understand how features identified in thissection can readily be combined with sets of base features identified asimplementations such as gesture recognition system, computer system,augmented reality, gestural interactions, virtual reality operatingsystem, or flowcharts.

In one implementation, the vertical sweep is an upward gesture. Inanother implementation, the vertical sweep is a downward gesture. In yetanother implementation, the diagonal sweep is an upward gesture. In afurther implementation, the diagonal sweep is a downward gesture.

This method can be implemented at least partially with a databasesystem, e.g., by one or more processors configured to receive orretrieve information, process the information, store results, andtransmit the results. Other implementations may perform the actions indifferent orders and/or with different, fewer or additional actions thanthose discussed. Multiple actions can be combined in someimplementations. For convenience, this method is described withreference to the system that carries out a method. The system is notnecessarily part of the method.

Other implementations can include a non-transitory computer readablestorage medium storing instructions executable by a processor to performany of the methods described above. Yet another implementation caninclude a system including memory and one or more processors operable toexecute instructions, stored in the memory, to perform any of themethods described above.

In one implementation, a method of navigating a virtual modalitydisplaying a plurality of virtual items arranged in a grid is described.The method includes detecting a horizontal sweep of a control objectresponsive to a first control gesture in a three-dimensional (3D)sensory space, defining a horizontal extent of translation along a firstaxis of a virtual grid in proportion to length of the horizontal sweepof the control object, detecting a diagonal sweep of the control objectresponsive to a second control gesture in the 3D sensory space, defininga diagonal extent of translation along a second axis of the virtual gridin proportion to length of the diagonal sweep of the control object,wherein the second axis is perpendicular to the first axis, andautomatically selecting a virtual item in the virtual grid at a terminalend of the diagonal sweep.

This method and other implementations of the technology disclosed caninclude one or more of the following features and/or features describedin connection with additional methods disclosed. In the interest ofconciseness, the combinations of features disclosed in this applicationare not individually enumerated and are not repeated with each base setof features. The reader will understand how features identified in thissection can readily be combined with sets of base features identified asimplementations such as gesture recognition system, computer system,augmented reality, gestural interactions, virtual reality operatingsystem, or flowcharts.

In one implementation, the horizontal sweep is an upward gesture. Inanother implementation, the horizontal sweep is a downward gesture. Inyet another implementation, the diagonal sweep is an upward gesture. Ina further implementation, the diagonal sweep is a downward gesture.

This method can be implemented at least partially with a databasesystem, e.g., by one or more processors configured to receive orretrieve information, process the information, store results, andtransmit the results. Other implementations may perform the actions indifferent orders and/or with different, fewer or additional actions thanthose discussed. Multiple actions can be combined in someimplementations. For convenience, this method is described withreference to the system that carries out a method. The system is notnecessarily part of the method.

Other implementations can include a non-transitory computer readablestorage medium storing instructions executable by a processor to performany of the methods described above. Yet another implementation caninclude a system including memory and one or more processors operable toexecute instructions, stored in the memory, to perform any of themethods described above.

In another implementation, a method of navigating a virtual modalitydisplaying a plurality of virtual items arranged in a grid is described.The method includes detecting a horizontal sweep of a control objectresponsive to a first control gesture in a three-dimensional (3D)sensory space, defining a horizontal extent of translation along a firstaxis of a virtual grid in proportion to length of the horizontal sweepof the control object, detecting a vertical sweep of the control objectresponsive to a second control gesture in the 3D sensory space, defininga vertical extent of translation along a second axis of the virtual gridin proportion to length of the vertical sweep of the control object,wherein the second axis is perpendicular to the first axis, andautomatically selecting a virtual item in the virtual grid at a terminalend of the vertical sweep responsive to a terminal gesture thattransitions the control object from one physical arrangement to another.

This method and other implementations of the technology disclosed caninclude one or more of the following features and/or features describedin connection with additional methods disclosed. In the interest ofconciseness, the combinations of features disclosed in this applicationare not individually enumerated and are not repeated with each base setof features. The reader will understand how features identified in thissection can readily be combined with sets of base features identified asimplementations such as gesture recognition system, computer system,augmented reality, gestural interactions, virtual reality operatingsystem, or flowcharts.

In one implementation, the control object is a hand. In anotherimplementation, physical arrangements of the control object include atleast a flat hand with thumb parallel to fingers. In yet anotherimplementation, physical arrangements of the control object include atleast open, closed, and half-open.

In one implementation, physical arrangements of the control objectinclude at least pinched, curled, and fisted. In one implementation,physical arrangements of the control object include at least mime gun,okay sign, thumbs-up, and ILY sign. In yet another implementation,physical arrangements of the control object include at least one-fingerpoint, two-finger point, thumb point, and pinkie point.

In one implementation, the horizontal sweep is a right gesture. Inanother implementation, the horizontal sweep is a left gesture. In yetanother implementation, the vertical sweep is an upward gesture. In afurther implementation, the vertical sweep is a downward gesture.

This method can be implemented at least partially with a databasesystem, e.g., by one or more processors configured to receive orretrieve information, process the information, store results, andtransmit the results. Other implementations may perform the actions indifferent orders and/or with different, fewer or additional actions thanthose discussed. Multiple actions can be combined in someimplementations. For convenience, this method is described withreference to the system that carries out a method. The system is notnecessarily part of the method.

Other implementations can include a non-transitory computer readablestorage medium storing instructions executable by a processor to performany of the methods described above. Yet another implementation caninclude a system including memory and one or more processors operable toexecute instructions, stored in the memory, to perform any of themethods described above.

In yet another implementation, a method of posting a virtual note from acollection of virtual notes arranged in a grid is described. The methodincludes generating a virtual grid with a plurality of virtual notesresponsive to gestures in a three-dimensional (3D) sensory space,detecting a first set of gestures in the 3D sensory space along a firstand second axis, navigating the virtual grid responsive to the first setof gestures to locate a particular virtual note in the virtual grid,wherein the first and second axis are perpendicular to each other,detecting a second set of gestures in the 3D sensory space along a thirdaxis, and removing the particular virtual note from the virtual grid andposting the virtual note on a virtual location determined by a terminalgesture in the second set of gestures, wherein the third axis isperpendicular to the first and second axis.

This method and other implementations of the technology disclosed caninclude one or more of the following features and/or features describedin connection with additional methods disclosed. In the interest ofconciseness, the combinations of features disclosed in this applicationare not individually enumerated and are not repeated with each base setof features. The reader will understand how features identified in thissection can readily be combined with sets of base features identified asimplementations such as gesture recognition system, computer system,augmented reality, gestural interactions, virtual reality operatingsystem, or flowcharts.

In one implementation, the terminal gesture is a flick of a whole hand.In another implementation, the terminal gesture is a flick of one ofindividual fingers or thumb of a hand. In yet another implementation,the terminal gesture is a flick of a set of bunched fingers or bunchedfingers and thumb of a hand.

This method can be implemented at least partially with a databasesystem, e.g., by one or more processors configured to receive orretrieve information, process the information, store results, andtransmit the results. Other implementations may perform the actions indifferent orders and/or with different, fewer or additional actions thanthose discussed. Multiple actions can be combined in someimplementations. For convenience, this method is described withreference to the system that carries out a method. The system is notnecessarily part of the method.

Other implementations can include a non-transitory computer readablestorage medium storing instructions executable by a processor to performany of the methods described above. Yet another implementation caninclude a system including memory and one or more processors operable toexecute instructions, stored in the memory, to perform any of themethods described above.

In one implementation, a method of selecting a virtual note from acollection of virtual notes arranged in a grid is described. The methodincludes generating a virtual grid with a plurality of virtual notesresponsive to gestures in a three-dimensional (3D) sensory space,automatically selecting a particular virtual note from the virtual gridresponsive to a prone inward scoop hand gesture in the 3D sensory space,including identifying the particular virtual note for selectionresponsive to positioning a hand behind the particular virtual note inthe virtual grid and selecting the particular virtual note responsive toa transition of physical arrangement of the hand from a resting positionbehind the particular virtual note to an inward scoop towards theparticular virtual.

This method and other implementations of the technology disclosed caninclude one or more of the following features and/or features describedin connection with additional methods disclosed. In the interest ofconciseness, the combinations of features disclosed in this applicationare not individually enumerated and are not repeated with each base setof features. The reader will understand how features identified in thissection can readily be combined with sets of base features identified asimplementations such as gesture recognition system, computer system,augmented reality, gestural interactions, virtual reality operatingsystem, or flowcharts.

The method further includes deselecting a selected virtual noteresponsive to an outward scoop hand gesture.

This method can be implemented at least partially with a databasesystem, e.g., by one or more processors configured to receive orretrieve information, process the information, store results, andtransmit the results. Other implementations may perform the actions indifferent orders and/or with different, fewer or additional actions thanthose discussed. Multiple actions can be combined in someimplementations. For convenience, this method is described withreference to the system that carries out a method. The system is notnecessarily part of the method.

Other implementations can include a non-transitory computer readablestorage medium storing instructions executable by a processor to performany of the methods described above. Yet another implementation caninclude a system including memory and one or more processors operable toexecute instructions, stored in the memory, to perform any of themethods described above.

In another implementation, a method of selecting a virtual note from acollection of virtual notes arranged in a grid is described. The methodincludes generating a virtual grid with a plurality of virtual notesresponsive to gestures in a three-dimensional (3D) sensory space,defining a horizontal extent of navigation along a first axis of thevirtual grid in proportion to length of a horizontal sweep of a hand inthe 3D sensory space, defining a vertical extent of navigation along asecond axis of the virtual grid in proportion to length of a verticalsweep of the hand in the 3D sensory space, wherein the second axis isperpendicular to the first axis, and automatically selecting a virtualitem in the virtual grid at a terminal end of the vertical sweepresponsive to a transition of physical arrangement of the hand fromclenched first to open hand.

This method and other implementations of the technology disclosed caninclude one or more of the following features and/or features describedin connection with additional methods disclosed. In the interest ofconciseness, the combinations of features disclosed in this applicationare not individually enumerated and are not repeated with each base setof features. The reader will understand how features identified in thissection can readily be combined with sets of base features identified asimplementations such as gesture recognition system, computer system,augmented reality, gestural interactions, virtual reality operatingsystem, or flowcharts.

In one implementation, the horizontal sweep is a right gesture. Inanother implementation, the horizontal sweep is a left gesture. In afurther implementation, the vertical sweep is an upward gesture. In yetanother implementation, the vertical sweep is a downward gesture.

This method can be implemented at least partially with a databasesystem, e.g., by one or more processors configured to receive orretrieve information, process the information, store results, andtransmit the results. Other implementations may perform the actions indifferent orders and/or with different, fewer or additional actions thanthose discussed. Multiple actions can be combined in someimplementations. For convenience, this method is described withreference to the system that carries out a method. The system is notnecessarily part of the method.

Other implementations can include a non-transitory computer readablestorage medium storing instructions executable by a processor to performany of the methods described above. Yet another implementation caninclude a system including memory and one or more processors operable toexecute instructions, stored in the memory, to perform any of themethods described above.

In yet another implementation, a method of selecting a virtual note froma collection of virtual notes arranged in a grid is described. Themethod includes generating a virtual grid with a plurality of virtualnotes responsive to gestures in a three-dimensional (3D) sensory space,automatically selecting a particular virtual note from the virtual gridresponsive to a pinching hand gesture in the 3D sensory space, includingidentifying the particular virtual note for selection responsive toflat-hand hovering of a hand above the particular virtual note in thevirtual grid and selecting the particular virtual note responsive to atransition of physical arrangement of the hand from flat-hand hoveringabove the particular virtual note to pinching of its thumb and one ormore fingers towards the particular virtual.

This method and other implementations of the technology disclosed caninclude one or more of the following features and/or features describedin connection with additional methods disclosed. In the interest ofconciseness, the combinations of features disclosed in this applicationare not individually enumerated and are not repeated with each base setof features. The reader will understand how features identified in thissection can readily be combined with sets of base features identified asimplementations such as gesture recognition system, computer system,augmented reality, gestural interactions, virtual reality operatingsystem, or flowcharts.

The method further includes deselecting a selected virtual noteresponsive to an expanding hand gesture.

This method can be implemented at least partially with a databasesystem, e.g., by one or more processors configured to receive orretrieve information, process the information, store results, andtransmit the results. Other implementations may perform the actions indifferent orders and/or with different, fewer or additional actions thanthose discussed. Multiple actions can be combined in someimplementations. For convenience, this method is described withreference to the system that carries out a method. The system is notnecessarily part of the method.

Other implementations can include a non-transitory computer readablestorage medium storing instructions executable by a processor to performany of the methods described above. Yet another implementation caninclude a system including memory and one or more processors operable toexecute instructions, stored in the memory, to perform any of themethods described above.

In one implementation, a method of selecting a virtual note from acollection of virtual notes arranged in a grid is described. The methodincludes generating a virtual grid with a plurality of virtual notesresponsive to gestures in a three-dimensional (3D) sensory space,automatically selecting a particular virtual note from the virtual gridresponsive to an open-hand gesture in the 3D sensory space, includingidentifying the particular virtual note for selection responsive to aone-finger point gesture of a hand towards the particular virtual notein the virtual grid and selecting the particular virtual note responsiveto a transition of physical arrangement of the hand from one-fingerpointing towards the particular virtual note to immediate opening of thehand above the particular virtual note.

This method and other implementations of the technology disclosed caninclude one or more of the following features and/or features describedin connection with additional methods disclosed. In the interest ofconciseness, the combinations of features disclosed in this applicationare not individually enumerated and are not repeated with each base setof features. The reader will understand how features identified in thissection can readily be combined with sets of base features identified asimplementations such as gesture recognition system, computer system,augmented reality, gestural interactions, virtual reality operatingsystem, or flowcharts.

The method further includes deselecting a selected virtual noteresponsive to clenching of the hand.

This method can be implemented at least partially with a databasesystem, e.g., by one or more processors configured to receive orretrieve information, process the information, store results, andtransmit the results. Other implementations may perform the actions indifferent orders and/or with different, fewer or additional actions thanthose discussed. Multiple actions can be combined in someimplementations. For convenience, this method is described withreference to the system that carries out a method. The system is notnecessarily part of the method.

Other implementations can include a non-transitory computer readablestorage medium storing instructions executable by a processor to performany of the methods described above. Yet another implementation caninclude a system including memory and one or more processors operable toexecute instructions, stored in the memory, to perform any of themethods described above.

In another implementation, a method of selecting a virtual note from acollection of virtual notes arranged in a grid is described. The methodincludes generating a virtual grid with a plurality of virtual notesresponsive to gestures in a three-dimensional (3D) sensory space,automatically selecting a particular virtual note from the virtual gridresponsive to a first clenching hand gesture in the 3D sensory space,including identifying the particular virtual note for selectionresponsive to flat-hand hovering of a hand above the particular virtualnote in the virtual grid and selecting the particular virtual noteresponsive to a transition of physical arrangement of the hand fromflat-hand hovering above the particular virtual note to curling of itsthumb and fingers above the particular virtual.

This method and other implementations of the technology disclosed caninclude one or more of the following features and/or features describedin connection with additional methods disclosed. In the interest ofconciseness, the combinations of features disclosed in this applicationare not individually enumerated and are not repeated with each base setof features. The reader will understand how features identified in thissection can readily be combined with sets of base features identified asimplementations such as gesture recognition system, computer system,augmented reality, gestural interactions, virtual reality operatingsystem, or flowcharts.

The method further includes deselecting a selected virtual noteresponsive to an expanding of thumb and fingers of the hand.

This method can be implemented at least partially with a databasesystem, e.g., by one or more processors configured to receive orretrieve information, process the information, store results, andtransmit the results. Other implementations may perform the actions indifferent orders and/or with different, fewer or additional actions thanthose discussed. Multiple actions can be combined in someimplementations. For convenience, this method is described withreference to the system that carries out a method. The system is notnecessarily part of the method.

Other implementations can include a non-transitory computer readablestorage medium storing instructions executable by a processor to performany of the methods described above. Yet another implementation caninclude a system including memory and one or more processors operable toexecute instructions, stored in the memory, to perform any of themethods described above.

In yet another implementation, a method of selecting a virtual note froma collection of virtual notes arranged in a grid is described. Themethod includes generating a virtual grid with a plurality of virtualnotes responsive to gestures in a three-dimensional (3D) sensory space,automatically selecting a particular virtual note from the virtual gridresponsive to a spread-and-bunch-again gesture in the 3D sensory space,including identifying the particular virtual note for selectionresponsive to bunched-fingers hovering of a hand above the particularvirtual note in the virtual grid and selecting the particular virtualnote responsive to a transition of physical arrangement of the hand frombunched-fingers hovering above the particular virtual note to spreadingapart of the fingers above the particular virtual note and to immediatebunching of the fingers.

This method and other implementations of the technology disclosed caninclude one or more of the following features and/or features describedin connection with additional methods disclosed. In the interest ofconciseness, the combinations of features disclosed in this applicationare not individually enumerated and are not repeated with each base setof features. The reader will understand how features identified in thissection can readily be combined with sets of base features identified asimplementations such as gesture recognition system, computer system,augmented reality, gestural interactions, virtual reality operatingsystem, or flowcharts.

The method further includes deselecting a selected virtual noteresponsive to a spreading apart of the fingers above the particularvirtual note without a subsequent immediate bunching of the fingers.

This method can be implemented at least partially with a databasesystem, e.g., by one or more processors configured to receive orretrieve information, process the information, store results, andtransmit the results. Other implementations may perform the actions indifferent orders and/or with different, fewer or additional actions thanthose discussed. Multiple actions can be combined in someimplementations. For convenience, this method is described withreference to the system that carries out a method. The system is notnecessarily part of the method.

Other implementations can include a non-transitory computer readablestorage medium storing instructions executable by a processor to performany of the methods described above. Yet another implementation caninclude a system including memory and one or more processors operable toexecute instructions, stored in the memory, to perform any of themethods described above.

In one implementation, a method of selecting a virtual note from acollection of virtual notes arranged in a grid is described. The methodincludes generating a virtual grid with a plurality of virtual notesresponsive to gestures in a three-dimensional (3D) sensory space,automatically selecting a particular virtual note from the virtual gridresponsive to an okay hand gesture in the 3D sensory space, includingidentifying the particular virtual note for selection responsive toflat-hand hovering of a hand above the particular virtual note in thevirtual grid and selecting the particular virtual note responsive to atransition of physical arrangement of the hand from flat-hand hoveringabove the particular virtual note to curling of its thumb andindex-finger above the particular virtual and vertical expansion ofother fingers.

This method and other implementations of the technology disclosed caninclude one or more of the following features and/or features describedin connection with additional methods disclosed. In the interest ofconciseness, the combinations of features disclosed in this applicationare not individually enumerated and are not repeated with each base setof features. The reader will understand how features identified in thissection can readily be combined with sets of base features identified asimplementations such as gesture recognition system, computer system,augmented reality, gestural interactions, virtual reality operatingsystem, or flowcharts.

The method further includes deselecting a selected virtual noteresponsive to paralleling of the thumb and the fingers.

This method can be implemented at least partially with a databasesystem, e.g., by one or more processors configured to receive orretrieve information, process the information, store results, andtransmit the results. Other implementations may perform the actions indifferent orders and/or with different, fewer or additional actions thanthose discussed. Multiple actions can be combined in someimplementations. For convenience, this method is described withreference to the system that carries out a method. The system is notnecessarily part of the method.

Other implementations can include a non-transitory computer readablestorage medium storing instructions executable by a processor to performany of the methods described above. Yet another implementation caninclude a system including memory and one or more processors operable toexecute instructions, stored in the memory, to perform any of themethods described above.

In one implementation, a method of selecting a virtual note from acollection of virtual notes arranged in a grid is described. The methodincludes generating a virtual grid with a plurality of virtual notesresponsive to gestures in a three-dimensional (3D) sensory space,generating for display a proximity indicator that provides visualfeedback regarding proximity of a hand to a particular virtual note andescalation from proximity to contact of the hand with the particularvirtual note, and automatically selecting the particular virtual notefrom the virtual grid when the hand approaches the virtual note withinan initial hover proximity threshold.

This method and other implementations of the technology disclosed caninclude one or more of the following features and/or features describedin connection with additional methods disclosed. In the interest ofconciseness, the combinations of features disclosed in this applicationare not individually enumerated and are not repeated with each base setof features. The reader will understand how features identified in thissection can readily be combined with sets of base features identified asimplementations such as gesture recognition system, computer system,augmented reality, gestural interactions, virtual reality operatingsystem, or flowcharts.

In one implementation, the generated display further includes modifyingsize of the proximity indicator responsive to distance between the handand the particular virtual note.

In another implementation, the generated display further includesmodifying appearance of the proximity indicator responsive to distancebetween the hand and the particular virtual note.

In yet another implementation, the generated display further includesmodifying shape of the proximity indicator responsive to distancebetween the hand and the particular virtual note.

In a further implementation, the generated display further includesmodifying opacity of the proximity indicator responsive to distancebetween the hand and the particular virtual note.

This method can be implemented at least partially with a databasesystem, e.g., by one or more processors configured to receive orretrieve information, process the information, store results, andtransmit the results. Other implementations may perform the actions indifferent orders and/or with different, fewer or additional actions thanthose discussed. Multiple actions can be combined in someimplementations. For convenience, this method is described withreference to the system that carries out a method. The system is notnecessarily part of the method.

Other implementations can include a non-transitory computer readablestorage medium storing instructions executable by a processor to performany of the methods described above. Yet another implementation caninclude a system including memory and one or more processors operable toexecute instructions, stored in the memory, to perform any of themethods described above.

In yet another implementation, a method of selecting a virtual note froma collection of virtual notes arranged in a grid is described. Themethod includes generating a virtual grid with a plurality of virtualnotes responsive to gestures in a three-dimensional (3D) sensory space,automatically selecting a particular virtual note from the virtual gridresponsive to a finger gun gesture in the 3D sensory space, includingidentifying the particular virtual note for selection responsive to aone-finger point of the finger gun towards the particular virtual notein the virtual grid and selecting the particular virtual note responsiveto a transition of physical arrangement of the finger gun fromone-finger pointing towards the particular virtual note to inwardcurling of a finger used to perform the one-finger pointing.

This method and other implementations of the technology disclosed caninclude one or more of the following features and/or features describedin connection with additional methods disclosed. In the interest ofconciseness, the combinations of features disclosed in this applicationare not individually enumerated and are not repeated with each base setof features. The reader will understand how features identified in thissection can readily be combined with sets of base features identified asimplementations such as gesture recognition system, computer system,augmented reality, gestural interactions, virtual reality operatingsystem, or flowcharts.

In one implementation, the one-finger pointing is performed using anindex finger. In another implementation, the method further includeselecting the particular virtual note responsive to a transition ofphysical arrangement of the finger gun from one-finger pointing towardsthe particular virtual note to inward curling of a thumb of the fingergun. In yet another implementation, the method further includesdeselecting a selected virtual note responsive to outward curling of thefinger used to perform the one-finger pointing.

This method can be implemented at least partially with a databasesystem, e.g., by one or more processors configured to receive orretrieve information, process the information, store results, andtransmit the results. Other implementations may perform the actions indifferent orders and/or with different, fewer or additional actions thanthose discussed. Multiple actions can be combined in someimplementations. For convenience, this method is described withreference to the system that carries out a method. The system is notnecessarily part of the method.

Other implementations can include a non-transitory computer readablestorage medium storing instructions executable by a processor to performany of the methods described above. Yet another implementation caninclude a system including memory and one or more processors operable toexecute instructions, stored in the memory, to perform any of themethods described above.

In another implementation, a method of selecting a virtual note from acollection of virtual notes arranged in a grid is described. The methodincludes generating a virtual grid with a plurality of virtual notesresponsive to gestures in a three-dimensional (3D) sensory space,automatically selecting a particular virtual note from the virtual gridresponsive to a rotation of a hand in the 3D sensory space, includingidentifying the particular virtual note for selection responsive to aprone flat-hand hovering of a hand above the particular virtual note inthe virtual grid and selecting the particular virtual note responsive toa transition of physical arrangement of the hand from pronation tosupination when the hand turns from a prone position to a supineposition while hovering over the particular virtual note.

This method and other implementations of the technology disclosed caninclude one or more of the following features and/or features describedin connection with additional methods disclosed. In the interest ofconciseness, the combinations of features disclosed in this applicationare not individually enumerated and are not repeated with each base setof features. The reader will understand how features identified in thissection can readily be combined with sets of base features identified asimplementations such as gesture recognition system, computer system,augmented reality, gestural interactions, virtual reality operatingsystem, or flowcharts.

The method further includes deselecting a selected virtual noteresponsive to supination of the prone flat-hand hovering over theparticular virtual note.

This method can be implemented at least partially with a databasesystem, e.g., by one or more processors configured to receive orretrieve information, process the information, store results, andtransmit the results. Other implementations may perform the actions indifferent orders and/or with different, fewer or additional actions thanthose discussed. Multiple actions can be combined in someimplementations. For convenience, this method is described withreference to the system that carries out a method. The system is notnecessarily part of the method.

Other implementations can include a non-transitory computer readablestorage medium storing instructions executable by a processor to performany of the methods described above. Yet another implementation caninclude a system including memory and one or more processors operable toexecute instructions, stored in the memory, to perform any of themethods described above.

The terms and expressions employed herein are used as terms andexpressions of description and not of limitation, and there is nointention, in the use of such terms and expressions, of excluding anyequivalents of the features shown and described or portions thereof. Inaddition, having described certain implementations of the technologydisclosed, it will be apparent to those of ordinary skill in the artthat other implementations incorporating the concepts disclosed hereincan be used without departing from the spirit and scope of thetechnology disclosed. Accordingly, the described implementations are tobe considered in all respects as only illustrative and not restrictive.

What is claimed is:
 1. A method including: creating a real-time digitalrepresentation of a real-world physical environment in which a user of ahead mounted device is positioned, the environment further comprising a3D sensory space with a defined volume of interest; generating, forcontinuous display by the head mounted device, a live video streamincluding the real-time digital representation of the real-worldphysical environment; providing the user of the head mounted device withthe generated live video stream; identifying, by the head mounteddevice, a virtual interactive item, from a library of virtualinteractive items, that corresponds to and provides information about areal-world marker identified from the live video stream; generating, fordisplay, (i) 3D virtual imagery including virtual imagery correspondingto the identified virtual interactive item, and (ii) virtual imagerycorresponding to a plurality of virtual items responsive to gestures inthe 3D sensory space, wherein the generated 3D virtual imagery issuperimposed, as a free-floating virtual modality in the real-worldphysical environment, allowing the user to simultaneously view both the3D virtual imagery and the real-world physical environment; providingthe user of the head mounted device with the generated 3D virtualimagery; detecting, using images provided in the generated live videostream, a gesture in the defined volume of interest of the 3D sensoryspace by the user of the head mounted device; interpreting the detectedgesture as selecting one virtual item from the library of virtualinteractive items; and executing an action associated with the selectedvirtual item.
 2. The method of claim 1, further including superimposing,for display by the head mounted device, the identified virtualinteractive item onto the corresponding identified real-world marker. 3.The method of claim 1, wherein the head mounted device includes one ormore projectors that project imagery into the real-world physicalenvironment, and wherein the method further includes: projecting, by theone or more projectors of the head mounted device, the identifiedvirtual interactive item onto the identified real-world marker in thereal-world physical environment.
 4. The method of claim 1, wherein: theidentifying of the virtual interactive item identifies two or morevirtual interactive items, from the library of virtual interactiveitems, that correspond to and provide information about the identifiedreal-world marker; and the identified two or more virtual interactiveitems are included in the 3D virtual imagery generated for display. 5.The method of claim 1, wherein the detected gesture is a scoopinggesture in which a representation of a hand of a user appears to startfrom a position behind a virtual item and then proceed in a motion thatappears to scoop up the virtual item from behind.
 6. The method of claim1, wherein the identified virtual interactive item is superimposed inplace of the identified real-world marker, so as to replace theidentified real-world marker from view of the user.
 7. The method ofclaim 1 wherein the real-world marker comprises at least one of a two-or three-dimensional barcode.
 8. The method of claim 1 wherein thereal-world marker comprises an image of a real-world item.
 9. Anon-transitory computer-readable recording medium having computerinstructions recorded thereon, the computer instructions, when executedby one or more processors, cause the one or more processors to performthe method of claim
 1. 10. A system comprising a memory storing computerinstructions and one or more processors, the computer instructions, whenexecuted by the one or more processors, cause the one or more processorsto perform the method of claim 1.